/ 
age Be 


sychometrika 


A JOURNAL DEVOTED TO THE DEVEL- 
OPMENT OF PSYCHOLOGY AS A 
QUANTITATIVE RATIONAL SCIENCE 


















































THE PSYCHOMETRIC SOCIETY - ORGANIZED IN 1935 











OLUME 5 
UMBER 1 


ARCH is 
9 40 








PSYCHOMETRIKA, the official journal of the Psychometric Society, is devoted to the 
development of psychology as a quantitative rational science. Issued four times 
a year, on March 15, June 15, September 15, and December 15. 


MARCH, 1940, VOLUME 5, NUMBER 1 


Office of Publication: The Psychometric Corporation, 1126 E. 59th Street, Chica- 
go, Illinois. 


Subscription Price: 

To non-members, the subscription price is $5.00 per volume of four issues. 
Members of the Psychometric Society pay annual dues of $5.00, of which 
$4.50 is in payment of a subscription to Psychometrika. Student members of 
the Psychometric Society pay annual dues of $3.00, of which $2.70 is in pay- 
ment for the journal. 

The subscription price to libraries and other institutions is $10.00 per year; 
this price includes one extra copy of each issue. 


Entry as second-class matter applied for at the post office at Chicago, Illinois, 
under the Act of March 3, 1879. 


Applications for membership and student membership in the Psychometric So- 
ciety should be sent to 

Dr. ANNE ANASTASI 

Chairman of the Membership Committee 

Queens College 

Flushing, New York 


All membership dues are payable to 
ALBERT K. KurRTZ, Treasurer 
173 Cornwall Street 
Hartford, Connecticut 


All other payments to the journal, including library subscriptions, non-member 
subscriptions, and author’s payments, should be sent to 

JACK W. DUNLAP 

University of Rochester 

Rochester, New York 


(Continued on the back inside cover page) 





























Pecans rey ws wre wR ALTER nS RL RIOR MOT NER RE 


- 
t 





Psychometrik 





CONTENTS 


THE FUTURE PSYCHOLOGY OF MENTAL TRAITS - - - 
TRUMAN L. KELLEY 


A FACTOR ANALYSIS OF MECHANICAL ABILITY TESTS 
WILLARD HARRELL 


POINTS OF NEUTRALITY IN SOCIAL ATTITUDES OF DE- 
LINQUENTS AND NON-DELINQUENTS - - - 
RUTH BISHOP 


FACTORIAL INVARIANCE AND SIGNIFICANCE - - - 
GALE YOUNG AND A. S. HOUSEHOLDER 


METHODS OF ITEM VALIDATION AND ABACS FOR ITEM- 
TEST CORRELATION AND CRITICAL RATIO OF 
UPPER-LOWER DIFFERENCE - - - - - - 

CHARLES I. MOSIER AND JOHN V. McQUITTY 


TIME SCORES AND FACTOR ANALYSIS - - - - - 
H. D. LANDAHL 








VOLUME FIVE MARCH 1940 NUMBER ONE 














PSYCHOMETRICA—VOL. 5, NO. 1 
MARCH, 1940 


\ 
THE FUTURE PSYCHOLOGY OF MENTAL TRAITS* 


TRUMAN L. KELLEY 
Harvard University 


If we may take the habits, attitudes and utterances of people of 
low mentality and little independence and originality as evidence of 
a primitive type of thinking and those of thoughtful people as evi- 
dence of a more highly developed type, we can expect to find in com- 
parative sociology the mental distinctions that characterize primitive 
and more highly developed peoples. If we compare the initial stages 
of a mental discipline with the later stages we can find evidence of 
the same distinctions. If we examine the mental processes of every- 
day life which require little effort and which we can engage in in 
indolent moods and compare with those that are distinctly effortful 
and which, as we perform them, carry the thrill of awakeness and vi- 
tality, we again find evidence of the same distinctions. In each of 
these comparisons the more primitive thought is largely in terms of 
broad categories and its decisions are unambiguous, unqualified and 
lead to unquestioned conclusion, while the higher type is either def- 
initely quantitative or concerned with much finer categories and leads 
to qualified conviction and tempered judgment. 

One of the outstanding developments of modern psychology is 
the quantification of psychological phenomena. This is a sign that it 
is coming of age and passing from the realm of primitive thought to 
that of cultured understanding. It is evidence that psychology is at- 
taining the status of technology and that it will shortly serve as in- 
trinsic a function in the processes of modern society as do the engi- 
neering and medical professions. Can we picture this function in 
more detail? It will not be academic in the sense that it concerns it- 
self with rats or guinea pigs or laboratory conditions, — psychology 
of this content is now with us and doing its full share in laying the 
foundation for a sociological psychology. It will not be expressed 
merely in the fringes of society and deal with the abnormal, the feeble- 
minded, the blind, the deaf, or other cases of marginal nature. It 
will not be limited to those types of functioning that are occasional 
or mainly avocational, such as inebriation, musical appreciation or 
revivalist religion. A psychology covering all of these fields is essen- 


* Address of the retiring President of the Psychometric Society, Stanford 
University, Sept. 1989. 


ee 








2 PSYCHOMETRIKA 


tial and part of what the future holds, but it is not the psychology I 
venture to forecast. 

In the understanding of fellow man the process of progressing 
from broad categories and crude quantitative distinctions to detailed 
categories and fine quantitative differences is one of sensing life in 
its ramifications and its more delicate nuances. Once done we can 
hardly be content with gross classifications yielding so many people 
for bread making, so many for burden carrying, so many for gun fod- 
der, so many for licking boots and ordering around as needed. The 
variety of life and the subtlety of mental functioning having once 
been sénsed lead inevitably to a concept of society which permits 
such life to flourish, for a lesser mode of personal living and personal 
appraisement is a stultification of the genius of life. This is the psy- 
chology of freemen. It is that of democracy only, or perhaps of a neo- 
democracy that permits a fullness of functioning conditioned by the 
potentialities of being rather than by the Procrustean limitations of a 
myopic authority. It is of the nature of a steam roller to level differ- 
ences and of power politics to crush idiosyncrasy. 

The expression “only educated men are free” takes on a sad 
meaning in a tight-walled social structure. It suggests that there only 
the cunning are ingenious enough to find modes of self expression — 
and almost daily we have heart-rending illustrations from the to- 
talitarian states of men of integrity and intelligence, but without cun- 
ning, failing in this search for freedom. But if the walls are pushed 
back freedom expands, and why not? Why not expand and make sin- 
cerity and the recognition of the rights of others, not cunning, the 
measure of freedom? We can hardly envisage a society tolerating the 
criminally-minded, the borers-from-within, or those so supine they 
will not fight to preserve the boon of freedom offered them, but why 
should not all others, including those having a minimum of intelli- 
gence, — just sufficient to know whether what they do is kindly and 
productive and not the reverse, — be free? I believe this is an at- 
tainable ideal, — in fact that we possess the roots of such in these 
United States, — and it is of the psychology of such a society that 
I speak. 

Herein are men and women variously endowed. Of this we have, 
irrefutable evidence. Herein the socially tolerated patterns of inter-. 
est and of conduct offered these multitudinous original natures are 
themselves multitudinous, with the result that life may expand to its 
fullest. This does not mean that it may grow at random, for there 
are both limitations in nervous structure and in the physical and so- 
cial conditions under which life is euthenic, even possible. These re- 
quire that it expand harmoniously. Each developing life must find 











SE are 





TRUMAN L. KELLEY 3 


one of the many harmonies that the bow of environment can bewitch 
from the strings of original nature. This process of finding calls for 
psychological insight of an order not needed in totalitarian states 
and not as yet developed in democracies. 

The intelligent freeman desires above all else to fully serve so- 
ciety and fully serve himself, or serve society while serving himself, 
or serve himself while serving society, — it matters not which way 
it is put. To serve himself requires that he function in the manner 
or manners that he is capable of, particularly in those ways that are 
most highly developed within him, and to serve society requires that 
such functioning be productive and increase the happiness or physical 
well-being of his fellow man. We are thus confronted with a single 
problem, individual adaptation, and a dual complexity, that of the 
psychological needs of society and that of the psychological assets of 
man. 

I would distinguish between the interests of man and his assets. 
Though we believe with John Dewey that interest and effort are fre- 
quently associated and that effort spent upon a task is a prime de- 
terminer of efficiency, nevertheless man’s interests are not an accu- 
rate measure of his abilities. Interest so readily becomes desire, and 
what shall we say of the desire to take the soft path, of indolence so 
ubiquitous we may not deny it? Without the pressure of economic or 
other necessity, too many would lead parasitic lives for us to counte- 
nance the belief that interest, frequently a convenient cloak for lazi- 
ness, is, of itself, creditable and entitled to respect. I knew a boy, or 
should I say a man, of age 24 whose “keen interest in art” was most 
marked at the time of the Beaux Arts Ball in Paris. He painted no 
pictures except a few impressionistic daubs which were entirely un- 
salable, and let his mother keep him in art school by doing the entire 
work of a rooming house. I know another, somewhat younger, whose 
“keen interest in science” expends itself upon pseudo-scientific maga- 
zines, but scrupulously avoids courses in mathematics and physics. 
Who has not met the person “keenly interested” in writing or music 
who behaved as though such interest justified his sponging off some- 
one else for a living? Surely interest, in and of itself, is not a key 
to psychometrics. 

Herein I see a difference between those “progressive” educators 
who feel they have accomplished something when their pupils “en- 
thuse” and find the key to educational activity in interest, and those 
who find it in achievement. Surely, whether viewed from the stand- 
point of the individual or of society, interest unfulfilled by achievement 
is a spurious coin. It neither measures social good nor lasting per- 
sonal satisfaction, for unless fortified with achievement it is ephem- 











4 PSYCHOMETRIKA 


eral and, as the normal reaction sets in, disappointing. To draw an 
illustration from the field of statistics and measurement, I could cite 
educators who at one time were enthusiastic about this field, but who 
never attained proficiency in it and are now disappointed with it. 
They would say that they were disillusioned, feeling that they had 
investigated and found it wanting, whereas the truth is they had 
never seriously studied its potentialities and limitations nor mas- 
tered its techniques, — they had only “enthused.” The mathemati- 
cian Sylvester had such an interest in poetry that he is reported to 
have expressed the conviction that he was the greatest poet in the 
world. He was well paid for his mathematics, but not for his sorry 
poetry, and we may well believe that the world would have lost a 
genius had he followed his paramount interest. 

Assuredly, when interest stimulates study and effort it is serv- 
iceable. If a man’s interest in a first legitimate field is greater than 
in a second, though his ability is greater in the second, no one other 
than a hard-headed employer who is paying for services rendered 
will question his full right to pursue the first activity. In such a case 
it is important that the man in question know himself in terms of the 
standards that the world sets and, knowing, then choose as he wishes, 
providing he does not shirk his normal responsibilities — for there 
are such — to self, to family and to society. There is always a proviso 
that attaches to the sufficiency of interest as a determiner of conduct, 
for of itself it may be destructive of moral fiber, self-debasing, self- 
injurious, selfish, injurious to others and anti-social. It is not a long 
step from the philosophies of Rousseau and Dewey to “the world 
owes me a good time and a soft living and I shall proceed to take it,” 
which certainly would be abhorrent, if not to Rousseau, to Dewey 
with his fine sense of social responsibility. Unfortunately the world 
is not populated with John Deweys, and so far as we know there is 
no correlation between keenness of interest and sensitivity to respon- 
sibility. 

Interest without achievement is counterfeit, deceiving both the 
possessor and the observer. Only if it leads to accomplishment has 
it merit, but if it does so lead then achievement itself is the measure 
of its merit and no other or further measure is needed. 

This discussion of interest is not a diversion, but necessary to 
make clear that in picturing a psychology based upon individual ca- 
pacities, abilities and achievements there has been no inadvertent 
neglect of what some look upon as the dynamo or essential spring to 
conduct. 

The psychological problems in a society where freedom reigns 
are more numerous and more detailed than in other societies because 

















TRUMAN L. KELLEY 5 


individuality is free to expand in such ways as it can expand and is 
not confined to a few ways only. The first psychological problem that 
we have is that of determining what are these various ways. What 
are the modes of functioning of which humankind is capable, with- 
out limiting these further than is necessary to make a democratic 
society politically and economically self-supporting and enduring? 
Without such limitations the germs of degeneracy within and the dic- 
tators without would soon destroy it. Granting that a limitation of 
the sort mentioned is necessary, there still remains an increased num- 
ber of socially acceptable forms of conduct so that the psychological 
problem is that of expanded living. We may list the permissible forms 
of conduct upon the one hand and the human talents on the other. 

The first list would include all those vocations and activities that 
are useful in our going social structure and it would include each with 
a weight proportionate to its breadth of functioning; it would include 
those amenities of life, such as honesty, cooperativeness, kindliness, 
that simplify and integrate social living; it would include those little- 
developed and much-needed civil virtues that enable man to correctly 
appraise and elect candidates for positions of public trust; and it 
would include those infrequent but incomparable traits and abilities 
that lead to invention and social progress for, of course, the nation 
itself “has got to run like everything” to keep abreast of the times. 

The second list includes those types of mental processes that ful- 
fill or serve the first list, but that is but part of the story. The first 
list represents needs that are largely stable from decade to decade, 
even from generation to generation, while the mental processes that 
serve these needs are growing and maturing personal powers. They 
have a personal origin in germ cell and a personal history in educa- 
tion and unique environment. Though carpenters be needed from 
generation to generation, the fulfillment of the need is in terms of 
transitory individuals, each with a certain unique original fitness, 
who grows into and learns the vocation and who in old age drops out. 
However stable the social need the psychological problem in terms 
of individuals is one of original capacities nurtured by a tying-togeth- 
er-education and a happy functioning in adulthood. Though a tying- 
together-education is called for, as, e.g., in the case of the stenogra- 
pher who must unite such diverse things as keyboard manipulation, 
spelling, shorthand, it is probably true that there are certain natures 
in which much of this integration is genetic or original endowment. 
The discovery of such bonds early in the educative process becomes 
a prime objective of the new psychology. 

This psychology must be catholic, it must meet all comers, ap- 
praise their assets and capabilities and serve their respective needs 





6 PSYCHOMETRIKA 


and it must include all vocations and appraise their demands in turn. 
Steps have been taken in the directions mentioned, but as encompass- 
ing the richness and variety of the human mind they are woefully in- 
adequate. We may readily locate important reasons for this inade- 
quacy: 

First is the fact that the social aspect of the problem is too broad 
to be envisaged by the average person. Time was, before the indus- 
trial revolution and before rapid transportation and communication, 
when a man watching life pass his front doorstep could fairly picture 
the possibilities of employment, the honorary and economic rewards 
of the same, and even the talents and training necessary for the suc- 
cessful pursuit of the same. All this is now changed. Probably in 
spite of the arts of communication the youth of today has a more 
distorted view of society entire and of the psychological demands of 
its different callings than his non-migrating, non-telegraphing, non- 
unionized, non-propagandized forbear of three generations ago. The 
youth of today is not blameworthy in this. The problem has become 
so complex that only carefully collected and organized knowledge 
meets the need. Vocational inventories no longer pass the front door 
of youth today, and above all psychological inventories of job require- 
ments do not. These are things that the new psychology must assume 
the responsibility for providing and that progressive education must 
provide the administrative means for communicating to developing 
boys and girls. 

Second is the fact that the variety of useful functioning now of- 
fered is such as to give freedom of expression to human nature in its 
tantalizing and impressive complexity, and the self-awareness of this 
variety is to be sensed only by an analysis of such detail as is beyond 
the powers of the average person without the help of a wise and 
uniquely trained psychologist. That this help is not now available 
specifically defines the task set for the new psychology. 

As a problem in personal psychology the task differs somewhat 
from what it is as a problem in professional psychology. The indi- 
vidual does not question that his interest is greater in a certain field 
than another. He may not question that his ability is greater along 
one line than another; he even may believe that his incidental and 
casual acquaintance with callings has adequately informed him of 
the nature of the demands of the callings. These things must not be 
assumed by the professional psychologist despite the conviction of 
the more vitally concerned individual. The specific requirement upon 
the psychologist is that he see and understand the individual in terms 
of a metric, background, or frame of reference, of competing indi- 
viduals. For example, a person’s mathematical ability becomes to him 











TRUMAN L. KELLEY 7 


a percentile rank or position in a distribution of abilities—the cases 
providing the distribution being the ordinary run of competitors of 
the individual in the field, mathematics, in question. This calls for 
an objective measuring device for the ability and a survey yielding 
norms, both of these things being beyond the scope of the individual 
though quite within that of an organized profession. 

This also calls for great discernment and logical soundness in 
the choice of the mental trait rubrics used, for practical needs as well 
as economy of thought demand that a fairly small number be em- 
ployed. Happily, there is another reason for this and it is inherent 
in the nature of the human mind. I postulate that there are modes 
of mental functioning which are independent of other modes and that 
these are not statistical artifacts, but inherent in the nervous struc- 
ture. These modes are demonstrable by establishing that their train- 
ing and development carries no concomitant development in the other 
modes of which they are independent. I also postulate that such 
modes as seem to be statistical derivatives of measured mental traits 
are not artifacts, but completely meaningful, for one is at liberty to 
create and use the necessary new interpretative concepts. 

It is axiomatic that thinking in terms of independent functions 
is more expeditious and more in harmony with elementary thinking 
processes than in terms of correlated measures. A few years ago 
some educationist claimed a great discovery, — that of the “law of 
the single variable,” meaning thereby that in an experimental situa- 
tion involving several variables one and only one should be changed 
at a time and its effect judged before changing the other factors enter- 
ing into the situation. Though this was scarcely a discovery, be- 
ing as old as the beginning of scientific method, nevertheless the 
point needs iteration and reiteration and stepping down to the lay 
mind if thinking is to be precise. If two variables are correlated it 
is experimentally impossible to hold one variable constant without 
limiting the range of the second, and if, in the field of mental relation- 
ships, two traits are correlated it is impossible to think of the func- 
tioning of the one independent of the other. For self-appraisal or for 
the understanding of others, independent mental abilities must be 
thought of whenever possible. 

Two important means are at our command for bringing this 
about. First, a reduction of the field of variability to a group con- 
sisting of acutely competitive individuals. This can be accomplished, 
first, by experimentally holding constant those things which may be so 
held without destroying the practical problem set, certainly holding 
constant such things as age, sex, and race and perhaps also specific 








8 PSYCHOMETRIKA 


training; second, by the proper choice of the mental dimensions em- 
ployed in measuring the complex personality. 

It is entirely logical to hold constant certain variables. In the 
situations of adolescent and adult life sex should be held constant, 
for it is a sociological phenomenon that by and large keener com- 
petition exists within a sex group than between groups. That women 
become lawyers and men hospital nurses is the exception, not the 
rule, and where these exceptions occur the woman should be judged 
upon the man-made scale of lawyer ability and the man upon the 
woman’s scale of nursing ability. For callings equally participated 
in by both sexes small difference results from treating them as one 
group or as two, so treating them separately loses nothing of value 
and may gain a little. There is a similar necessity for treating sep- 
arately such races as are pronouncedely competitive intraracially. 

Probably the most important groups to treat separately are dif- 
ferent age groups. A modification of this, in youth, is found in grade 
groups, but as school grade is a man-made demarcation which will 
vanish with the close of formal education and which even largely van- 
ishes as the child leaves the portals of the school for playground and 
home, the more primary age distinction would seem to be the more 
imporiant in determining naturally competitive groups. An excep- 
tion in favor of the grade group is warranted in a problem involving: 
nothing except the competition existing within a single school grade. 

Reducing the measurement and psychological problem to normal. 
ly homogeneous and competitive groups has an important bearing: 
upon the rectilinearity of the mental relationships found. We have 
abundant evidence that the typical mental or physical growth curve 
is ogival in form: slow growth in infancy, rapid in the middle years 
from eight to ten, and again slow development as adulthood is ap- 
proached. Dealing with groups homogeneous with reference to age 
and sex eliminates this particular curvilinearity, to the great sim- 
plification of mathematical and psychological understanding. 

That the ogival form is typical for many, perhaps all, mental 
functions might seem at first sight to argue for the existence of a 
general intellective ability, since high correlation between ali mental 
traits that grow and develop with increasing age will be found in an 
age heterogeneous group. If three children, ages five, nine and fifteen, 
were measured in reading ability, arithmetic ability, and social un- 
derstanding and the results shown in correlation tables, we would 
probably find that the five-year-old was lowest and the fifteen-year- 
old highest in all three, so that a general correlation between traits 
is present, and if the data are worked up by any factor analysis meth- 
od they will show a general factor. But such a factor is not to be es- 








TRUMAN L. KELLEY 9 


tablished by such data. If, instead of three measures from one five- 
year-old we had one each from three five-year-olds and similarly for 
the other ages and constructed correlation tables as before, we would 
still find high correlation and a large general factor, but it is here 
obviously spurious, being just a correlation between different levels 
of growth, for no child has been tested with more than one measure 
and no possibility exists for establishing a general intellective func- 
tion. Certainly the idea of individual differences was intended to 
mean more than that older children know more than younger chil- 
dren. 

A human being may well be like a garden in which entirely dif- 
ferent seeds have been planted upon a given date, — the cucumber 
becomes a creeping vine, the corn a stalk, the cabbage a head, and 
the potato a tuber. These are independent modes of living, but there 
is a close paralled in relative stages of development reached week by 
week. Such parallelism does not argue for a common trait between 
them other than that they have had the same date of birth. In the 
case of the child the germ that started his arithmetical ability, the 
germ that started his reading ability, and the one that started his 
social understanding all date from the same moment, conception, so 
that should these traits in truth be utterly independent in their modes 
of functioning they nevertheless could show great similarity in levels 
of attainment at successive stages in the growth cycle. If human be- 
ings are such gardens then, when we deal with age homogeneous 
groups, the general factor due to growth disappears, it being con- 
stant for all members of the group, and the true individual blossoms 
out with all his individuality and uniqueness. 

Past studies with age heterogeneous groups have loaded the dice 
in favor of correlated as against independent traits and the resulting 
factorial pictures have been needlessly complicated. Let us give in- 
dependent mental factors a decent opportunity to show themselves; 
if they are then found present, we are indeed blessed with a gratify- 
ing simplicity of picture, though, be assured, it will not be so simple 
as to lose its charm. 

Let us assume that such disturbers of the picture of mental rela- 
tionships resident in mankind as age, sex, race, and highly specific 
training have first been taken account of and thus ruled out of the 
residual picture. All of these things are very objective and utterly 
meaningful and therefore just such things as it is desirable to retain 
as essential rubrics in the appraisal of fellow beings. They are, in 
short, just those things that we do not want to mix up with a little 
of this trait and a little of that and get a factor which is very difficult 








10 PSYCHOMETRIKA 


to interpret. Having allowed for these, what may we expect the resi- 
dual picture to be like? 

It would not be surprising if it changed with « change in these 
things held constant, especially age and sex. The organization of men- 
tal traits in childhood may be other than that in adulthood. Even 
though it may differ in some respects from age to age and from sex 
to sex, there is already strong evidence to lead us to expect that in 
important respects the patterns of personality will remain the same 
though age and sex change. Those traits that are found to persist 
will of course have special merit as essential rubrics. Linguistic abil- 
ity promises to be one of them. It now seems that we may retain this 
rubric for all ages, excepting only the first year of life, and that it 
will serve the needs of a person or of his counselor whether schooling, 
vocation, avocation, or social contacts constitutes the issue. 

Linguistic ability is not like one of the things partialed out for it 
has no physical stigmata. Its “meaning” is a construct, a discovery, 
a matter of experience, and it is defensible solely because of its util- 
ity in answering questions pertaining to schooling, vocation, avocation 
and social life. We should peremptorily discard it if some other one 
or two related traits could be shown to have greater utility. The pur- 
pose to be served is economy of thought and none other and if lin- 
guistic ability, coupled with other trait measures, best serves this 
purpose let us keep it, but if some as yet un-named trait which is in 
a dimension oblique to linguistic ability better serves this purpose let 
us go through the necessary reconstruction of our thought processes, 
find through systematic and carefully controlled experience the full 
meaning or utility of this better trait and finally give it a name and 
honored position in our list to be measured and known. I do not an- 
ticipate that linguistic ability will thus be replaced, but as a mental 
construct it should be judged relative to the merits of other mental 
constructs and hold no preferred position because of its traditional 
status. 

Tradition is a necessary foundation for the interpreting of past 
and present conditions, but it is without distinction as a means of bet- 
ter interpreting the future. “Better” implies that something that is 
an improvement upon traditional methods must be utilized. I picture 
a new psychology that will do more than existing psychology in fully 
and accurately interpreting human nature, and I am willing to enter- 
tain the belief that present terms and concepts are inadequate and 
should be replaced by concepts which will yield a fuller, a truer and 
a more readily understandable picture. 

The two most important characteristics of these new concepts 
are that they be independent and that they touch life in its fullness. 








TRUMAN L. KELLEY . 11 


There are cogent reasons for a set of independent mental traits 
as opposed to one of correlated traits. ‘One who has computed a mul- 
tiple regression-equation tying several correlated variables to a cri- 
terion and has attempted to understand the resulting weightings, — 
a process involving partialling out in mind from each variable the 
bearing of each and every other variable, — will fully appreciate the 
inadequacy of the human mind to do this thing. Precise thought in- 
volving the relationships between several correlated variables is im- 
possible with such powers as I am equipped with. As this inadequacy 
is not due to lack of effort or training I judge that it is a real diffi- 
culty which must be circumvented in my case and that of others by 
choosing uncorrelated variables rather than overcome by a massive- 
ness of mental power sufficient to cope with the complexity of corre- 
lated phenomena. A reader of the penetrating mathematical and log- 
ical analyses of A. R. Fisher quickly becomes aware of the fact that 
he solves complex problems by breaking them down into independent 
parts. We, with lesser intellects, can scarcely hope to solve them 
otherwise. 

I am forced to grant that for such phenomena as age and sex 
we have no option and must take them as they come, correlated or 
not. The phenomena of sex are correlated with age and as we can 
neither deny this nor circumvent it we must do the best thinking we 
can, cumbersome though it be, with these correlated variables. 

No such handicap is placed upon us with reference to the set of 
mental traits that we have the power to select from among an in- 
finitude of equivalent systems of traits, all covering the same broad 
field of mental performance. It is now well known not only by mathe- 
matical experts in multi-variate analysis, but also by psychometrists, 
that all the interrelationships recorded in a correlation matrix of n 
variables, let us say n mental measures, can equally well be shown by 
any of an infinite number of sets obtained by rotating the axes given 
by the » original variables. This set of m new variables (in certain 
instances less than n) can replace the original variables with no loss 
of detail or accuracy in the total phenomena of interrelationship at 
hand. 

Generally the original variables have the merit of being defined 
by scores on existent and tangible paper-and-pencil tests while the 
derived or rotated variables are linear combinations of these pending 
someone’s devising separate tests for them, a task calling for psycho- 
logical insight and ingenuity but quite within reason to accomplish. 
I therefore credit the original variable with having a satisfying pres- 
ent meaning in terms of a presently existing test, but do not consider 
this of primary importance, for the same sort of objectivity of mean- 








12 PSYCHOMETRIKA 


ing is presumably obtainable with the proper expenditure of labor in 
the case of any derived measure. Should this prove not to be so after 
a serious attempt it undoubtedly would be due to psychological modes 
of functioning of the human mind which would be revealed in the at- 
tempt. Such new information would of course then be incorporated 
in the problem of search for mental factors. Until proved otherwise, 
I consider that we are at liberty to select any set of derived measures 
that we choose. 

There are certain fundamental principles which should influence 
our selection: 

The original variables should be wisely chosen and weighted so 
as to encompass the life situations which it is desired to explain psy- 
chologically. 

The factors comprising the final set should be uncorrelated. 

These factors should be ordered for magnitude; this ordering, if 
the original variables have been wisely chosen and weighted, is also 
an ordering for importance, so that if certain factors only are to be 
measured and known the more valuable ones can be selected. 

The factors comprising the final set should be as stable as pos- 
sible with changes of age, thus avoiding new factors and new inter- 
pretative devices as growth takes place. A completely unchanging 
set from infancy to old age is not to be expected, but large preference 
should be given to any traits that are stable within the individual and 
of continuing social importance from year to year. Where this sta- 
bility criterion is more or less incompatible with those of indepen- 
dence and magnitude, some practical compromise is necessary, but, 
in reaching such a compromise, one should be extremely averse to in- 
troducing correlation between traits. For example, should arithmetic 
ability be one of the independent mental traits at age 12 and heavily 
weighted at that age for social importance, but not so found or so 
weighted at age 5, it would probably be better to keep than to discard 
it at age 5, in order to preserve a continuity in rubrics and profile 
techniques, for the 5-year-old besides living in his own world is grow- 
ing into that of the 12-year-old. 

As a final practical guide the final factors should be determin- 
able with high precision and with low time, administrative and scor- 
ing cost. 

I have fully discussed the statistical means of determining the 
importance of final factors that shall touch life in its fullness in my 
book “Essential Traits of Mental Life.” I grant that the wise selec- 
tion of the initial variables in order to parallel social importance is 
a task of judgment and that the proof of the wisdom or lack of wis- 
dom of such selection will become demonstrable only as the final fac- 











TRUMAN L. KELLEY 13 


tors are derived and as measures for them are devised and applied to 
representative social groups. 

It is a truism, seldom mentioned and frequently overlooked, that 
in a factor analysis undertaking any measurable mental trait may be 
magnified by the simple device of including it with varying degrees 
of purity in a sufficiently large number of the original measures en- 
tering into the matrix to be factorized, until it becomes the first or 
largest component. In order to avoid prejudicing the factorial out- 
come by the number and nature of the initial measures employed, 
Hotelling has suggested that all possible psychological measures be 
included in the original list of variables to be factorized. This pro- 
posal shows a keen awareness of the problem, but it has a shortcom- 
ing even more serious than its impracticality. It would constitute an 
attempt to make the resulting mental factors a function of mental 
abilities and organization only, whereas to be useful they must equally 
be a function of the conditions of life under which people live. 

As an ideal approach I would suggest one involving all possible 
psychological traits, not on an equal footing, but weighted according 
to the extent to which they differentiate the social groups that it is 
desired to differentiate and according to their independence of the 
other measures involved. This matter of independence is automatical- 
ly handled in the case of final measures, for these are independent 
factors, but it can only be judged of in the case of the initial meas- 
ures employed. The extent to which a measure differentiates between 
groups is determinable by the application of that single measure to 
the groups involved. If sociability, as measured, is possessed in sub- 
stantially different amounts in socially distinguishable groups — office 
workers — salesmen, accountants —- ministers, miners — politi- 
cians, etc., — awareness of it has a real importance in aiding the in- 
dividual or his counselor in matters of education and vocational 
choice. If sensibility to high pitches, such as the bat’s squeak, which 
is a psychological trait in which people vary widely, is on the aver- 
age possessed in substantially the same amount in socially distin- 
guishable groups, — musicians — non-musicians, etc., — awareness 
of it has little importance and it accordingly becomes a trait to be 
omitted entirely or given very little weight on the list to be factorized. 

A mental trait under one environment may not be one under an- 
other, for in the one case it may differentiate between sociological 
groups but not do so in the other. The various tones in the Chinese 
language have meaning not paralleled in English. The mental ability 
requisite to sensing these intonations is a mental trait in Chinese cul- 
ture, but not with us. If a Chinese says té-wo (broad a, negative 
even) there may be serious sociological consequences if you think he 








14 PSYCHOMETRIKA 


has uttered ta-wo (broad a, ascending), for the first is a request that 
you answer him and the second that you beat him. Dr. Eugene Shen 
tells me of an actual occurrence in which an American hostess in 
Shanghai directed her Chinese servant to k’u-tzu (coo[l i]ts) where- 
as she should have asked him to ku-tzu ([s]choo[] i]ts). He respond- 
ed to the first and brought the pants instead of the fruit. This may 
well have had social consequences. These distinctions, so important in 
China, need not be sensed in America, so that the sensing ability is a 
trait in China but not here. Illustrations can be drawn from differ- 
ent places in our country and different periods in our history. All 
this goes to show that a trait is inexorably connected with the en- 
vironment under which one lives. 

Those mental characteristics that have large consequence in 
terms of educational, vocational, and social adaptation are important 
and are entitled to a weight in the complex of mental performances 
factorized proportionate to such importance. In a simple multi-vari- 
ate analysis problem consisting of a matrix of correlation coefficients 
with ones in the diagonal, the measures in question have all been 
assigned an equal weight, i.e., given a variance of 1.00, and no 
correct subsequent determination of components destroys this metric, 
but what guarantee is there that these initial variances correspond to 
social importance? Generally speaking there has been no such guar- 
antee and all analyses are consequently suspect. To assert that an 
n-dimensional space, the dimensions being given by the initial meas- 
ures factorized, each with a variance of 1.00, well represents the di- 
mensions within which the individual and society function is more 
than extreme, it is absurd unless a careful survey has shown that the 
measures do represent the social universe within which we live and 
move and struggle with our problems. 

I have pictured a democratic social structure requiring a wide 
variety of mental functioning and permitting one still wider in rec- 
reational and creative activities. I have pictured humankind as com- 
posed of individuals doubly individualized, — once in terms of levels 
of achievement and again in terms of combinations or patterns of 
achievement. I have pictured the new psychology as coping with this 
complexity, as having the tools and the insight for comprehending the 
major features of the geography of the individual, his level of ma- 
turity, his sundry dimensions, and his presumptive future capability 
levels in each, as having the tools and the breadth of view for com- 
prehending the broad features of the geography of society, its many 
functional psychological patterns with their respective frequency of 
occurrence which are required to serve its many needs. I have pic- 
tured a’ private and public guidance service which will utilize well- and 





. a —- 





TRUMAN L. KELLEY 15 


specially-trained experts in a new and expanded psychological profes- 
sion. That the fulfillment of this picture calls for men and women 
with a breadth and depth of training in logical or mathematical analy- 
sis, in the psychology of growth, in that of individual differences, and 
a knowledge of the activities of society not as yet provided, is a chal- 
lenge, especially to us of the Psychometric Society. 








PSYCHOMETRICA—VOL. 5, NO. 1 
MARCH, 1940 


A FACTOR ANALYSIS OF MECHANICAL ABILITY TESTS* 


WILLARD HARRELL 
University of Illinois 


The intercorrelations of thirty-seven variables, including the 
Minnesota battery of “mechanical ability” tests, the seven MacQuar- 
rie tests of “mechanical ability,” O’Connor’s Wiggly blocks, and the 
Stenquist picture-matching test, were analyzed by Thurstone’s cen- 
troid method. Five factors, Perceptual, Verbal, Youth, Manual Agil- 
ity, and Spatial, were taken out. Factors prominent in so-called me- 
chanical ability tests are the oe and Perceptual ones with Mac- 
Quarrie’s dotting test significantly high in thé Manual Agility fac- 
tor. Each of the factors can be measured with group pencil-and- 
paper tests. 


PROBLEM 

Although the measurement of mechanical ability is no longer con- 
sidered to be a problem directly related to the nation’s immigration 
policy as it was in the Minnesota investigation, it still has a patriotic 
flavor. The need for adequate tests in selecting thousands of men to 
be trained as airplane mechanics illustrates this. The Minnesota in- 
vestigators emphasized the magnitude of the problem of mechanical 
ability by relating that (7, p. 3) “...in the United States the largest 
single occupational class is that engaged in the manufacturing and 
mechanical industries. The members of this group make up nearly 
one-third of the working population. ... ” 

Guilford (3, p. 510) lists mechanical ability as one of the inde- 
pendent, primary factors about whose existence many already agree. 
Since no citation is given, the basis of agreement is left uncertain. It 
is likely that the agreement is not general, and that present thought 
is little advanced from the time when the Minnesotans wrote: (7, 


p. 5) 


At present there is little agreement as to the definition of mechanical 
ability. In the past, and in common usage, it has often been confused 
or made synonymous with motor ability, but most investigators now 
agree that mechanical ability is a more inclusive term. There has de- 
veloped a fairly definite consensus of opinion that the term “motor 
ability” should be restricted to rather simple functions of isolated mus- 


* Acknowledgment is gratefully made to the State Engineering Experiment 
Station at the Georgia School of Technology for sponsoring and financially sup- 
— the studies; to the Graduate Research Committee of the University of 

llinois for providing funds for the purchase of tests and the tabulation of data; 
and to Dr. E. L. Welker of the University of Illinois Mathematics Department 
for assistance with statistical problems. 


ca, soe 








18 PSYCHOMETRIKA 


cles or muscle groups; whereas the term “mechanical ability,” the ex- 
act nature of which remains undetermined, is used to refer to whatever 
capacities and abilities are necessary for certain kinds of work — spe- 
cifically, work that involves the manipulation of tools, the operation of 
machinery, and the planning and execution of pieces of work which in- 
volve these and similar activities. 

The factorial approach is a promising one for the analysis of me- 
chanical ability since it permits (3, p. 472) the determination of the 
smallest number of abilities that must be postulated in order to ac- 
count for a table of intercorrelations, as well as the determina- 
tion of how much of each ability is represented by each test. The 
third aim of factor analysis, the important practical one of setting up 
regression equations by which an individual’s amount of any primary 
ability can be estimated from tests that depend upon that ability, is 
not included in the present study. 

Like the intelligence testers who defined intelligence as what the 
intelligence tests test, the tentatively adopted definition of mechanical 
ability is ‘““whatever the mechanical ability tests test.” This circular 
definition is not facetious, as it sounds, for practically all measures 
called tests of ‘‘mechanical ability” have been validated against some 
criterion. The most commonly used validating criterion has been 
some measure of success at mechanical pursuits, either marks or a 
product from a machine shop course. 


SUBJECTS 

The subjects were 91 Georgia cotton mill machine fixers. They 
ranged in age from 19 to 51; the mean was 30.5 years. Their school- 
ing varied from none to the completion of college; the mean schooling 
was 7.8 years. They were chosen because of the prospect of validat- 
ing criteria in the form of superiors’ ratings. The men were from sev- 
eral departments—weaving, carding, and spinning contributing the 
largest number. The fixer’s job involves diagnosing mechanical diffi- 
culties, taking machines apart, repairing and replacing worn parts 
with new ones, and putting the machine together again. 


SELECTION OF VARIABLES 


Thirty-one of the thirty-seven variables included are scores on 
separate tests. The Minnesota spatial relations test was scored both 
for time and errors. The five non-test variables are age, school grade 
completed, experience on mechanical] jobs, and two ratings. The rat- 
ings were made on one scale devised to measure competence on the 
job, and on a second called “mechanical ability.” 

The selection of tests was guided mainly by the effort to analyze 











WILLARD HARRELL 19 


the Minnesota battery (7). The Minnesota tests — assembly (6) ,* 
spatial relations (4 and 5), form boards (7), and interest blank (8) 
—formed the core around which other tests were included for their 
promise in identifying the abilities guessed to be present in the Min- 
nesota tests. 

Two new variables, both of which employ the Minnesota mechan- 
ical assembly material, are included. One of the new variables was 
the time required to assemble the Minnesota material after it had 
been administered according to the usual instructions; the other new 
variable was the time required to take the material apart. 

Other tests included which bear the name of mechanical ability 
are the Stenquist I—a picture matching test (8), O’Connor’s wiggly 
blocks (6), and the MacQuarrie test (5). 

Since the spatial relations ability is commonly regarded to be an 
important constituent of mechanical ability, a punched holes test (11) 
and Thurstone’s two lozenges tests (9) were included. 

Dexterity is another possible constituent of mechanical ability. 
It has been mentioned as a probable factor, for example, in assembly 
tests. Consequently Whitman’s (15, 1) series of three pinboards, a 
pegboard and peg sorting, and two tests of manipulating nuts and 
bolts were included along with Crockett’s (2) block packing, laying 
blocks along strips, and screwing different sized nuts on bolts and 
placing them in a board. 

Three other tests, opposites, word analogies, and word comple- 
tion were chosen for their apparent independence of mechanical abil- 
ity. Table 1 shows the time limit and scoring method for each of the 
tests. 


TEST ADMINISTRATION 

Administration of the test battery consumed about seven hours 
divided fairly evenly into four sessions. The first and last sessions 
were done in groups of from 12 to 20 subjects. The second and third 
sessions were necessarily individual due to the nature of the tests. 
Three cotton mill organizations co-operated in the study which was 
made in four Georgia towns. Testing was done in vocational school 
buildings in three of the towns; in the fourth a storeroom in the mill 
was used. Employees in one of the firms were paid by their company 
for their time, while those of the other two organizations donated 
their time. The only perceptible difference between the voluntary 
groups and the paid group was less perfect attendance in the former. 


* Numbers in parentheses that are not bold-face refer to the variable num- 
bers as given in Tables 1, 7, and 8 





20 


Time Limits and Scoring of Tests 


Variable Scoring 


Wts. 
R 
R 


Wts. 
Time 
Time 


Errors | 
Time {| 


1/3 no. holes 
1/3 no. holes 
No. holes 

No. pegs 

2 x no. disas. 
2 x no. assem. 
No. pegs 
Time 


aaa 


i) 
vw 


(lines) 


SHAAN A 


bo bY 
== 


DANA 


PSYCHOMETRIKA 


TABLE 1 


Test 


Session I 


Interest Blank 
Form Board A 
Stenquist I 


Session II 


Assembly 
R. Assembly 
R. Stripping 


Session III 
Spatial Relations 


Whitman Manual Dexterity Series 
Pinboard — preferred hand 
Pinboard — non-preferred hand 
Pinboard — both hands 


Pegboard 


Disassembling nuts and bolts 
Assembling nuts and bolts 
Sorting pegs 

Wiggly Blocks — 3 trials 


Crockett’s Manual Ability Tests* 
Screwing nuts on bolts 
Packing blocks 
Laying blocks along a strip — 4 trials 


MacQuarrie’s Test for Mechanical Ability* 


Tracing 
Tapping 
Dotting 
Copying 
Location 


Counting Blocks 


Pursuit 


Session IV 
Thurstone’s S-R, A 
Thurstone’s S-R, B 


Punched Holes 
Opposites 
Analogies 
Completion 
Form Board B 


* There was fore-practice before each of these tests. 





Time 


Work limit 
15 min. 
86 min. 


56 min. 50 sec. 
Work limit 
Work limit 


Work limit 


1 min. 
1 min. 
2 min. 
1 min. 
30 sec. 
80 sec. 
80 sec. 
Work limit 


2 min. each 
1 min. each 
1 min. each 


50 sec. 

80 sec. 

15 sec. 
21/2 min. 
2 min. 
21/2 min. 
21/2 min. 


2 min. 
2 min. 
2 min. 
8 min. 
4min. 
4 min. 
15 min. 











WILLARD HARRELL 21 


METHOD OF ANALYSIS 

All data were punched on tabulating cards. Details of using 
punched cards in factor analysis will probably be published by Mr. L. 
A. Wilson, who supervised the tabulating work. Pearson correlation 
coefficients were obtained. They are shown in Table 2, and their dis- 
tribution is given in Table 3. 

Five factors were extracted by Thurstone’s centroid method. The 
centroid matrix is given in Table 4, and the distribution of the resi- 
duals after the removal of each factor is given in Table 5. Five fac- 
tors appeared to be enough since the standard deviation of the fifth 
factor residuals is .056, whereas that of the mean original correlation 
coefficient is .098. In other words, the standard deviation of the resi- 
duals is about half that of the correlation coefficients. 

Axes were orthogonally rotated 24 times, the transformation ma- 
trix being shown in Table 6. At the suggestion of Professor Thur- 
stone, axes were obliquely rotated 36 times. Oblique rotations were 
performed by a modification of the Thurstone method (12) suggested 
by Mr. L. R. Tucker. 

Because of the interest in comparing the final oblique solution 
well as the final oblique solution (Table 8), is shown in detail. A 
comparison of the two solutions also facilitates identification of the 
common factors. The final oblique transformation matrix is given in 
Table 9. 

The aim in the rotation procedure was to maximize the number 
of insignificant factor loadings. A test adopted for significance was 
+ .20, a value twice the standard deviation of the mean correlation 
coefficient. Only one negative loading exceeds —.20, and this is not 
for a test variable but for a rating which has been reflected, so the 
negative loading is consequently comprehensible. 

The use of oblique axes necessitates the interpretation of the re- 
lationship between factors. The cosines between axes (Table 10) 
were treated as correlation coefficients and factored again by the cen- 
troid method to determine the relationship between factors. Two com- 
mon factors came out (Table 11). A discussion of these common fac- 
tors will follow the identification of the group factors. 


INTERPRETATION OF FACTORS 
The minimum factor loading of identification variables, follow- 
ing Thurstone (11), was set at about .40. Identification of the fac- 
tors is made by taking the highest values out of the successive col- 
umns of Table 8. The interpretation would be little altered by using 
the final orthogonal solution as shown in Table 7. Discussion, how- 








22 PSYCHOMETRIKA 


ever, follows the oblique solution since the number of insignificant 
factor loadings is greater than in the orthogonal solution, and conse- 
quently the discussion is briefer than it would be for the orthogonal 
solution. The group factor loadings of the oblique solution shown in 
Table 8 are often considerably less than those of the orthogonal solu- 
tion shown in Table 7. Identification of group factors in terms of the 
oblique solution consequently suffers from the apparently weakened 


validity of the tests. 


Factor I 
(9) Routine assembly .624 (4) SR accuracy .392 
(10) Routine stripping A457 (17) Peg sort .392 
(6) Assembly 415 (14) Peg sticking 384 
(29) Stenquist I O97 


Factor I is the most difficult of the five factors to identify—pre- 
sumably because it was not anticipated when the test battery was 
planned. Only one of the seven identification tests here, the Stenquist 
picture matching, is a pencil-and-paper test. The two tests with the 
highest weights involved the Minnesota Assembly material under new 
conditions. None of the tests with high factor loadings are verbal. 
This factor seems to be Thurstone’s perception (P) factor (11), even 
though none of the Thurstone perceptual tests are present in the above 
list. Thurstone (13) writes that: 


The tests that call for this ability require the quick perception of detail 

in either visual or verbal material. This seems to be a perceptual abil- 

ity which enables some people to excel in finding detail which is signifi- 

cant to them or detail which they are seeking. It is probably one of the 
factors that are involved in what has been called “quick intelligence.” 

To scan a page to find quickly some small but significant detail, to classi- 

fy objects quickly, are examples of this factor. 

One reason for deciding that this is not essentially a manual fac- 
tor is because factor IV is definitely manual. The factor I identifica- 
tion tests differ from those of factor IV in that they demand more 
discrimination and are less routine. 

The Minnesota assembly test does not significantly possess a 
manual dexterity factor, as some have thought, and neither are its 
scores significantly influenced by experience on mechanical jobs, as 
others have written. A practice effect is shown, however, when the 
test is repeated. While both P and S are present when the test is 
given for the first time, the S factor drops out upon a second adminis- 
tration, leaving a significant weight in P only. Since an accuracy 
score was used at the first administration and a speed score was used 
at the second administration, it may be insisted that this interpreta- 











WILLARD HARRELL 23 


tion is only tentative. Still, the indication that the perceptual factor 
becomes more prominent as the task becomes easier is hearteningly 
consistent with the interpretation of factor P. 


Factor II 
(35) Completion .561 (34) Word Analogies .480 
(33) Word Opposites AT (2) Schooling .456 


Factor II is clearly verbal (V) since the three verbal tests, com- 
pletion, opposites, and analogies are among the identification vari- 
ables and do not show significant loadings with any other of the four 
factors. Schooling also is saturated with this verbal factor as one 
might anticipate. This is another factor which Thurstone has isolated 
(13). The verbal factor is characterized by tests that involve the in- 
terpretation of language. This factor, unlike a factor W, is not re- 
stricted to mere fluency with words. The factor V reflects an ability 
to deal readily and quickly with verbal material. The orthogonal so- 
lution in Table 7 shows that the tapping test (23) has a strangely high 
loading in the second factor. This may be explained by the hetero- 
geneity of the subjects’ schooling. In other words, the longer one is 
in school, the more practiced he is in using a pencil and hence the 
higher his tapping score. 


Factor III 
(1) Youth .630 (14) Peg sticking .450 
(3) Inexperience .618 (17) Peg sorting 383 
(37) Poor mechanical 
ability rating A491 


Only two tests, Peg sticking and Peg sorting, are present here to 
a high degree. The two variables with the highest weights, Youth 
and Inexperience, identify this factor as Youth (Y). The high load- 
ings in the two tests may be due to a youthful willingness to follow 
directions, or to a deterioration with age. The former explanation is 
preferred. Why just these two tests and no others should be highly 
weighted with Y may be understood by remembering that sticking 
and sorting the pegs according to their colors impressed the machine 
fixing subjects as more childish than other tests. It looked fairly sen- 
sible, for example, to the machine fixers to put together a spark plug 
or a pair of calipers and there the older fixers would co-operate, but 
they could not understand why they should hasten to put red pegs in 
one tray and purple pegs in another tray. The younger fixers, how- 
ever, were more impressed with the entire testing program and could 








24 PSYCHOMETRIKA 


not be dissuaded from believing that their responsiveness would in- 
fluence their future chances of promotion. 

The high weight of poor mechanical ability rating may account 
for the absence of correlation between this rating and mechanical 
ability tests. Ratings were influenced considerably by length of ac- 
quaintanceship, as other investigators have found. 

While Youth has not been isolated before as a factor, other stud- 
ies have shown a maturation factor. 


Factor IV 
(13) Pinboard with both (20) Block packing .084 
hands .505 (12) Pinboard with non- 
(24) Dotting 501 preferred hand .380 
(15) Nuts off 426 (16) Nuts on .378 


Factor IV includes only one paper-and-pencil test, dotting. Other 
tests highly saturated with factor IV are two of the pinboard tests, 
two nut and bolt tests, and block packing. This factor is clearly one 
involving manual dexterity or agility (A). Although the tests iden- 
tifying both factors IV and I are manual, a distinction may be drawn 
in that tests high in IV are more routine than those high for I, which 
requires perception, e.g., in picking an object of a particular color. 


Factor V 
(32) Punched holes .585 (28) Pursuit 515 
(27) Block counting .540 (25) Copying 494 
(7) Minnesota form (31) Lozenges B 487 
boards 517 (6) Assembly 383 


Factor V is clearly spatial (S), another of the factors which 
Thurstone has identified. The assembly is the only non-paper test 
among the identification tests. The paper tests highly saturated with 
this factor, punched holes, blocks, form boards, pursuit, copying, and 
lozenges, are all spatial in nature and several of them have been so 
analyzed in other factor studies. 

According to Thurstone (13) the spatial factor 


... is an ability that is involved in the tests of visualizing space. The 
factor S is present in those tests which require the subject to think vis- 
ually of geometrical forms and of objects in space. While none of these 
factors can be described in detail as yet, it seems reasonable to expect 
that those who have a high rating on ability S should be able to do well 
in those studies and in those occupations that require visualizing or 
thinking about things in visual form. Many people think about a prob- 
lem visually even when the nature of the problem does not immediately 
suggest any necessary visual character. 











WILLARD HARRELL 25 


COMMON FACTORS 

When the cosines of the angles between the axes of the group 
factors (Table 9) were factored, two factors were removed. No other 
significant factor seemed to be present since the standard deviation 
of the residuals was .035. The two common factors (Table 10) seem 
to be g and a common speed factor. Perhaps subjects more homogene- 
ous for age and schooling would give smaller intercorrelations be- 
tween group factors. 


SUMMARY AND CONCLUSIONS 

Mechanical ability tests, according to this analysis, are composed 
principally of factors P and S. 

Ratings of mechanical ability, and of success on machine fixing 
jobs, probably because of their low validity, do not show sufficient 
weights to allow an analysis of mechanical ability except in terms of 
tests. Harrell (4) found in a previous study that an assembly test 
correlates significantly with the success of loom fixers where ratings 
were more reliable than in the present study. 

P and S are the only factors composing mechanical ability tests 
with the single exception of the MacQuarrie Dotting Test which has 
a .501 weight in A. Since some definitions of mechanical ability in- 
clude routine manual tasks, it is not strange to find manual dexterity 
represented in the analytical definition. 

The four Minnesota tests, the seven MacQuarrie tests, wiggly 
blocks, and the Stenquist picture matching test are the tests which 
have been labeled measures of mechanical ability. Spatial relations 
accuracy shows its single significant weight (.392) in the P factor. 
The Minnesota assembly is the only test to have significant weights in 
both P and S (.415 and .383 respectively). Perhaps this fact, as well 
as its impressive appearance, explains its popularity. The Minnesota 
form boards show their single significant weight with S (.517). The 
only significant loading of the Stenquist picture matching test (.397) 
is found in P. The interest blank, the spatial relations speed scores, 
the wiggly blocks, and the tracing, tapping, and location tests all fail 
to show a significant factor loading. 

Three of the MacQuarrie sub-tests, copying, blocks, and pursuit, 
possess high loadings only in S. The loadings are .494 for copying, 
.540 for block counting, and .515 for visual pursuit. 

In emphasizing perception and spatial ability as elements of me- 
chanical aptitude it is not intended to insist that other factors are not 
present in other so-called mechanical ability tests. 

Besides the two factors P and S, three additional factors were 
found. These were the familiar verbal factor, and two new factors, 








“LE ‘98 ‘BI ‘OT ‘6 ‘GS ‘PY ‘S ‘Ti SeIqQuiIwA Ul pesuBYyd 
e1eMm SUZIS ‘a[qB} STy} UL payqIWO o1B syUIOd [BUeP [|] Y ‘S/BUlloap 2a14} 0} S}USIDIYe09 asey} UO PesBq SBM YOM [¥d1}S1}BIS JUENbasqns 


Le 

vy 98 

20 «00 se 

$0- OI- 8L ve 

vO $0- 28 6L ee 

80 ZI- 6h 68 99 ae 

st g0- 09 T9 8h 69 Ts 

LI 80- 0&8 I8 && OF OF 0€ 

L0 20 8b &b bh OF OF £9 63 

20 00 FF OF 6H LS Zh SE FF 8% 

90- 20- 98 6h 88 Sg 6y SS HH 99 kd 

80- yI- 29 LS 19 2@§ 89 LE 6h OS 69 92 

Ot $0- 9% 28 99 ¥9 FF SH OF £9 T9 9 rd 

ZO 80- 0& 62 SE 98 2% 2 LE LE GZ HH BE ¥Z 

<_< so ¢$0- 6b ft 19 Ib 82 O08 S& Gb PH 3S ES OF &% 
Ns ZI- LO- 82 68 8& GS 88 82 68 SE Ih Ib Le GE 99 G3 
oe 10- pO- 6E BT TZ 92 2 LIT 92 HE GZ 0 BE LE 9B LZ 13 
Be g0O- 80- 88 Ib Lb 28 62 OF FH LH SF 88 OF 9E 19 Zh GE 02 
= 90 10 2 bE 2 ES SI LZ OF TS 8% | v2 68 62 GI OL 6F 61 
Oo 90 SI O08- 28- LZ- 6Z- 82- SZ— 6P— 98- 9E- BE LE- OZ— LI- 9I- LI- LE- OF- 81 
je) vO 00 82 8% 28 Le GI IS S& SS 98 Th SH 62 OF SO LE FH 9% S2- LI 
1.e) Il PI OT 60 60 9f 80 88 LE 8ST OT OT LO Th 9% GT ZI 82 Sb Le Bgl 91 
- 91 vO 02 22 33 9% GI 2S 62 OZ SS 8 62 OF BE OF BI BE OF TE Ze OF st 
oe II- OI- 88 &F 9F 88 92 SP SS Se SB IS 6h FH 8H SG 98 99 SF LE- b9 FE IP vl 
$I- $0- I 8I 22 2 GE IZ 82 12 13 SI SZ HE GZ HE OT 9F LE SS EI LO IE BE &I 

SI ZI L0- 80- 40 80 60 $0 80 LI It 80- Of 9T 80 FO FO ST 12 TE St ST 96 LI VE et 

L0 20- 9t 8t HI 12 93 GE 9B ZI OT 02 GI 8E TE SZ GI Zh 9E 8I- Ge Ib GI I HE OI Ul 

$0- LO- ZI- SI- 80- Z2- OI- 22- ZE- HZ- OS— 0Z- Z2- FI- YI- 8Z- 00 6Z- 9P- SE 8BZ- 6E- SE- TE Te- OT- If or 

10 00 JL2- S2- bZ- GS- 6Z- 8E- 99- SS- SP— SE- 8E- LZ—- SZ- BE- E0O- Ih- 9F- 8h PHh- ZP- BE- ZS- BE- LI- 6E- F9 6 

8I- SI- L2 28 LE 8S Zt 10 LE Zt PT FZ 6O ST ST LO FT 60 00 80- ZI FO FO FE 90 OF 90 FO OT 8 

L0O- bI- 88 99 SS 69 SF SF 9 SO $9 G9 LE G2 98 SH IS LH FE SS- SF ZI Lo GS ST BO 1% Bs- EE- TZ L 

$0 10-91 SI FI 98 82 ZS Sh LZ 68 IZ 62 20 ZO IT SI 32 ZE 9P- 2 12 02 I8 OT SI 12 Th- 89- OT 39 9 

ZI ZO0- 98— 8&—- SE- IS— O&- 98— 88— 9S— Z— Zb- BS- BI- 6Z- GE- SI- 68- Ih—- WH ZP- LI- SZ- Qb- Z2- ZO- 9I- LZ 8b Bd- 99- SE- g 

90 10 OI- 80- ZI- 80 00 II- b2- S0- 20- PO- ZO OT 20- ZI- 90- 80- 10 FI OI- SI- LO- ST- OT 10 20- 80 St sI- St- 60- FO- ¥ 

¥Z 0% QI- ZE- OZ- ZZ- OZ- SI— FZ- SI-— OZ— 9Z- ZZ- HZ- GI- S8- SZ- EZ 80- 02 9E- FO- GO- BF- LI- ST I18- LO IE O8- PE G2- BI se g 

yI- FO 8h 89 0S HS LZ OL I8 HZ 8S Gh 6Z ZZ BE GI ZO- LI ZI 8% FE 90 FO Le 90 TI~ 80 80 10 be FE BO BB- 10 $s- 3 

8% IL IZ- S8- 8Z— 62- 9Z- 9Z— PZ- EZ— 8Z— OF— ZE- IS- 6Z- HP- FZ- ZZ- 8O 80 SE- II- LO- 9P- FI- OT 92- 90 02 HZ- VE 60- FS LI- 39 O8- I 


L8 98 SE PS SS zB IS 08 62 SZ LS 93 BS HS SZ ZZ ITZ 02 GI SIT LT OT SE HT SI St TE OT 6 8 LY F FY SF B 


26 


gSPUSIOIJ20D Uolye[eII0D [eUlsIIO 
6 OTaVL 





“4545 were 


oe 





WILLARD HARRELL 27 


Youth, and Manual Agility. Manual Agility is shown not only in the 
dotting test, but in such typically routine mechanical tasks as screw- 
ing nuts on bolts, and in the pinboard test and block packing. 

The factors are not independent but exhibit two common factors 
which are tentatively identified as g and a common speed factor. 

A useful finding of the present study is that paper-and-pencil 
tests measure each factor that has been measured by such cumber- 
some materials as the Minnesota assembly and spatial relations tests, 
and wiggly blocks. The latter two can be given only as individual 
tests, whereas all of the paper-and-pencil tests used may be given as 
group tests. 

While only one paper test, the Stenquist picture matching, has a 
significant weight in factor P, if its identification is correct, other 
paper tests may be used as Thurstone has shown (13). 

The dotting test is the only paper test which is significantly 
weighted in the Manual Agility factor, but here also other paper tests 
can probably be devised. 


TABLE 3 
Distribution of Original Correlation Coefficients 
r frequency r frequency 
—.7* 2 +.1 719 
—.6 4 +.2 90 
—.5 17 +.3 79 
—.4 48 +.4 69 
—3 55 +.5 40 
—.2 45 +.6 13 
—.1 58 +.7 2 
+.0 64 +.8 t 


* The lower limit of each class interval is listed. 





28 











PSYCHOMETRIKA 
TABLE 4 
Centroid Matrix 
Variable 
Number I II III IV 
: -436 349 oll —.389 
2. 093 390 —.015 134 
3. -406 .212 ll —.431 
4, 089 —.171 —.153 —.089 
5. 571 —.073 —.150 —.109 
6. 466 —.324 —.362 —.271 
ae -764 .109 —.295 —.229 
8. .270 .224 .134 —.116 
9. .666 —.456 —.173 —.254 
10. 422 —.446 —.113 —.044 
2. AT4 —.299 .298 —.047 
12.___.} 54 _ —.299 _ .128 .222 
413 ——— .257 216 
% -761 —.132 O02 —.244 
15. 476 —.289 .129 -256 
16. 892 —.440 .180 125 
ay. -595 —.081 .168 —.288 
18. 548 —.192 —.189 —.017 
19. 524 —.427 .089 .239 
20. .681 —.120 -200 .083 
21. 366 .089 .140 —.066 
Ze. -595 .082 189 —.157 
23. .629 .204 .261 .233 
24, .515 041 .368 .228 
nate "heme: 02 .061 
.719 326 — 123 —.008 
“4 .688 .166 —.272 —.010 
28. 635 ATi —.185 136 
29. -706 —.159 —.176 —.064 
30. -585 —.111 —.181 —.069 
31. 572 .225 —.232 .095 
82. .675 .278 —.217 .129 
33. -703 406 —.061 257 
34, 691 .414 —.072 188 
35. .652 344 —.188 -208 
36. .069 .210 108 —.262 
37. .013 201 -252 —-.375 





037 
—.274 
133 
—.354 
.030 
185 
101 
—.116 
—.039 
—.054 
—.044 


192 


254 
—.119 


—.069 
—.172 
132 
045 
077 
215 
—.064 
—.211 
068 
127 
—.087 
145 
190 
—.108 
083 
124 


—.264 
—.208 
—.298 
.259 
119 











WILLARD HARRELL 29 


TABLE 5 
Distribution of Residuals after each Factor was Removed 
Residual I II III IV V 
.00* 85 61 69 10 91 
01 58 74 78 92 99 
02 50 63 66 90 92 
03 47 65 67 87 82 
04 67 62 70 69 70 
05 43 52 62 49 59 
06 85 46 54 48 41 
07 43 36 37 37 85 
08 85 42 40 380 26 
09 85 31 82 27 23 
10 82 19 23 9 14 
11 80 18 14 12 4 
12 25 19 9 13 10 
13 18 14 12 8 5 
14 16 21 8 5 4 
15 18 11 4 5 8 
16 13 6 6 1 1 
17 17 5 4 3 0 
18 14 5 al 0 0 
19 3 3 3 2 1 
.20 5 3 Z 0 0 
at 6 2 3 0 0 
22 6 8 i 0 1 
28 3 : 0 1 
24 38 2 0 0 
25 gf 0 0 0 
.26 5 0 1 1 
27 1 0 0 
.28 2 0 0 
.29 0 0 0 
30 2 0 0 
31 1 0 0 
32 4 0 0 
88 0 0 0 
84 0 0 0 
85 1 0 1 
36 2 1 
OF 0 0 
38 0 1 
43 2 
TABLE 6 
Transformation Matrix 
for Orthogonal Solution 
I II III IV Vv 
a 454 .641 .251 .386 415 
b —.715 562 221 —.340 .094 
c —.254 —.007 433 .646 —.575 
d —.209 314 —.836 885 —.108 
e —.420 —.419 .048 413 .690 


* The lower limit of each class interval is listed. 





30 


1 
2 
3 
a 
5 
6 
7. 
8 
9 


. Youth 

. Schooling 

. Inexperience 
. Spatial Relations Accuracy 
. Spatial Relations Speed 
. Assembly 


Form Boards 


. Interests 

. Routine Assembly 
. Routine Stripping 
. Pinboard (Preferred hand) 

. Pinboard (Non-pref’d. hand) 


. Pinboard (with both hands) 
. Peg sticking 

. Nuts off 

. Nuts on 

. Peg sort 

. Wiggly blocks 

. Placing bolts 

. Block packing 

. Block strip 

. Tracing 

. Tapping ‘ 
. Dotting 


. Copying 

. Location 

. Blocks 

. Pursuit 

. Stenquist I 

. Thurstone A 

. Thurstone B 

. Punched holes 

. Word Opposites 

. Word Analogies 

. Completion 

. Poor General Rating 
- Poor Mechanical Rating 


PSYCHOMETRIKA 


TABLE 7 
Factor Loadings of Orthogonal 
Solution with Communalities 


P 
—.065 
—.010 
—.012 

369 
360 
514 
349 
001 
-742 
571 
381 
124 


.090 
456 
309 
449 
418 
382 
451 
294 
—.010 
223 
111 
035 


155 
163 
204 
105 
537 
370 
086 
.050 
101 
084 
179 
—.200 
—.195 


2 8.171 


V Y A 
336 648 .116 
628 053 —.052 
187  .650 175 
082 —.024 —.187 
279 155 +119 

—043 124 028 
439 284 021 
310 .267 022 
109 202 .186 
029 —.007 202 
188 219 441 

—081 —148 408 
130 .010 570 
385 504 .410 
195 —.100 .490 
071 —.029 .437 
317 487 184 
184 0384 ~—-.208 
151 —122 515 
362 165 .496 
173 288 ~—— 265 
404. 877 ~—.287 
678 111 845 
393 110 539 
543 102 ~—-.169 
679 202 .048 
472 107 ~—-.089 
468 .013  .198 
390 114 .148 
258 106 154 
AT3 «019.082 
546 089 ~—-.159 
‘871 012 .084 
822 067 .066 
803 —.030 —.030 

—029 342 081 

—031  .4838 —.006 

12.121 5.734 7.857 





S h2 
.103 561 
.005  .400 
148010 

—126 104 
349 §=.867 
528 .561 
592 .746 

—O11  .168 
8384 ~=.750 
166 = .395 

—.028  .408 
71 215 
160 376 
057 ~=—-.781 
113 ©~=—-.896 

—.043 400 
055 .503 
411 391 
.131 524 
201  58i 
285 212 

* 119 425 

—.040 .605 
028 .459 
493 591 
341 647 
559 = .596 
503 522 
312 = .571 
402 ~=.400 
468 .457 
557 ~=—-.688 
155 800 
203 =. .738 
183 = .712 
194 196 
005 272 

7.933 18.013 











(COW R TB 0o DO 


WILLARD HARRELL 


TABLE 8 


Final Factor Loadings 
Using Oblique Axes 


Youth 


. Schooling 

. Inexperience 

. Spatial Relations Accuracy 
. Spatial Relations Speed 

. Assembly 


Form Boards 


. Interests 

. Routine Assembly 

. Routine Stripping 

. Pinboard (Preferred hand) 

. Pinboard (Non-pref’d. hand) 


. Pinboard (with both hands) 
. Peg sticking 

. Nuts off 

. Nuts on 

. Peg sort 

. Wiggly blocks 
. Placing bolts 
. Block packing 
. Block strip 

. Tracing 

. Tapping 

. Dotting 


. Copying 

. Location 

. Blocks 

. Pursuit 

. Stenquist I 

. Thurstone A 

. Thurstone B 

. Punched holes 

. Word Opposites 

. Word Analogies 

. Completion 

. Poor General Rating 
. Poor Mechanical Rating 


F 
—.012 
—.059 

.033 
392 
251 
415 
228 
.016 
™~ 624 
457 
295 
—.005 


—.076 
384 
123 
319 
392 
.230 
248 
128 

—.078 
167 

—.015 

—.111 


—.015 
.049 
045 

—.080 
397 
.238 

—.066 

—.134 

—.044 

—.048 
.046 

—.155 

—.082 

= 4,497 


V 
—.038 
456 
—.177 
.282 
043 
—.190 
052 
149 
—.026 
.001 
—.036 
—.185 


—.187 
.059 
—.006 
.007 
110 
—.050 
—.018 
—.002 
—.150 
.094 
367 
035 


109 
305 
077 
058 
179 
.004 
117 
086 
547 
.480 

~ 561 
—.257 
—.196 
2.660 


Y 
~ 630 
.058 
618 
—.043 
058 
—.026 


142: 


.267 
068 
—.090 
193 
—.160 


—.013 
450 
—.135 
—.055 
383 
—.078 
—.174 
118 
.200 
336 
121 
117 


007 
129 
—.008 
—.077 
009 
001 
—.065 
—.054 
—.014 
033 
—.071 
321 
491 
3.687 


A 
.023 
—.047 
061 
—.194 
—.018 
—.163 
—.176 
—.005 
—.004 
091 
357 
380 


~.505 
270 
426 
378 
065 
.060 
430 
384 
180 
134 
322 
501 


035 
—.059 
—.062 

077 
—.008 

010 
—.026 

034 

.047 

014 
—.073 
—.028 
—.043 

3.883 


31 


Ss 
.068 
079 
.082 

—.178 
275 
383 
517 

—.007 
154 
044 

—.125 
042 


137 
—.063 
072 
—.130 
—.051 
832 
.058 
146 
212 
065 
—.001 
048 


494 
352 
540 
515 
222 
327 
487 
585 
234 
271 
244 
180 
—.020 
6.585 





32 


PSYCHOMETRIKA 
TABLE 9 
Final Oblique Transformation Matrix 
A B Cc D E 
a .249 .163 .142 .207 347 
b —.630 Bea! 272 —.278 285 
c —.192 —.184 569 .719 —.573 
d —.460 297 —.763 527 .074 
e —.541 —.881 —.027 .292 682 
= —1.574 —.328 .193 1.467 815 
TABLE 10 
Cosines of Angles between Axes 
and Original Common Factor Loadings 
I II III IV V 
A 1.000 .046 .045 402 429 
B .046 1.000 402 406 482 
C .044 .400 1.000 251 407 
D 404 .406 251 1.000 482 
E 430 481 408 482 1.000 
= 1.924 2.333 2.106 2.541 2.800 
k, 452 .605 505 .675 -761 
k, —.461 825 887 —.175 .063 
TABLE 11 
Common Factor Loadings after 
45° Clockwise Orthogonal Rotation 
I II III IV Vv 
k, —.006 .658 .631 854 583 
k, .646 .198 .083 601 494 
REFERENCES 


Bronner, A. F., et al. A manual of individual mental tests and testing. Bos- 
ton: Little, Brown and Co., 1927. 

Crockett, A. C. Measure of manual ability. J. appl. Psychol., 1980, 14, 414- 
424, 

Guilford, J. P. Psychometric methods. New York: McGraw-Hill Book Co., 
1936. 

Harrell, W. The validity of certain mechanical ability tests for selecting 
cotton mill machine fixers. J. soc. Psychol, 1937, 8, 279-282. 

MacQuarrie, T. W. A mechanical ability test. J. person. Res., 1927, 5, 329- 

337. 

O’Connor, J. Born that way. Baltimore: Williams and Wilkins, 1928. 
Paterson, D. G., et al. Minnesota mechanical ability tests. Minneapolis: 
Univ. Minnesota Press, 1930. 

Stenquist, J. L. Stenquist mechanical aptitude tests. Yonkers, New York: 
World Book Co., 1921. 








10. 
11. 


12. 


18. 


14. 








WILLARD HARRELL 33 


Stoelting, C. H., and Co. Catalog of psychological and physiological appara- 

tus and supplies. Chicago. 

Thurstone, L. L. The vectors of mind. Chicago: Univ. Chicago Press, 1935. 

Thurstone, L. L. Primary mental abilities. Psychometric monographs, No. 

1. Chicago: Univ. Chicago Press, 1938. 

Thurstone, L. L. A new rotational method in factor analysis. Psychomet- 

rika, 1938, 3, 199-218. 

Thurstone, L. L. Manual of instructions. Tests for primary mental abilities. 

Washington: Amer. Council Educ., 1938. 

Tucker, L. R. A method for finding the inverse of a matrix. Psychomet- 
rika, 1938, 3, 189-197. 

Whitman, E. C. A brief test series for manual dexterity. J. educ. Psychol., 

1925, 16, 118-123. 











ages 


1g 


cr 








PSYCHOMETRIKA—VOL. 5, NO. 1 
MARCH, 1940 


POINTS OF NEUTRALITY IN SOCIAL ATTITUDES OF 
DELINQUENTS AND NON-DELINQUENTS* 


RUTH BISHOP 


College Entrance Examination Board 
Princeton, New Jersey 


Delinquent boys are compared with non-delinquents with re- 
spect to their attitudes towards a series of “good” and “bad” so- 
cial acts, by the use of scales having rational origins of measure- 
ment. A new technique, essentially an extension of Thurstone’s 
Method of Successive Intervals, is found to give results similar to 
Horst’s Method of Balanced Values. Significant differences in mean 
attitude between the two groups are not found. 


The general purpose of this study is to determine whether differ- 
ences can be demonstrated between the attitudes of delinquents and 
non-delinquents. Although the criminal code is based upon the place- 
ment of acts on a crude psychological scale in order of their social 
acceptability, it is quite possible that the members of the different 
groups in a society do not agree as to the order in which the acts may 
be divided into “good” and “bad” forms of behavior. It might be ex- 
pected that delinquent and non-delinquent groups would demonstrate 
differences in their ability to make distinctions between the social ac- 
ceptability of any given pair of acts. For example, one of the groups 
might make little or no distinction between two acts and place them 
at about the same place on the scale, while the other group might 
make a clear distinction between the acts, even considering one of 
them as being definitely on the “good” side of the scale and the other 
definitely on the “bad” side of the scale. 

In the studies made on this problem, a marked similarity in ver- 
bal attitudes between delinquent and non-delinquent groups has been 
reported. Simpsonj{, on the basis of a series of experiments in which 
he used a psychophysical rank order technique, concluded that the atti- 
tudes of college students and school teachers toward a group of asocial 


* The author wishes to thank Dr. M. W. Richardson for his invaluable en- 
couragement and counsel in this study. This paper is a part of a dissertation 
accepted by the faculty of the Department of Psychology, The University of 
Chicago, for the degree of Doctor of Philosophy. 

+ Simpson, R. M. Attitudes towards the ten commandments. J. soc. Psychol., 
1933, 4, 223-230. 

Simpson, R. M. Attitudes of teachers and prisoners toward seriousness of 
criminal acts. J. crim. Law Criminol., 1984, 25, 76-83. 





36 PSYCHOMETRIKA 


acts were not essentially different from the attitudes of inmates in 
state prisons. According to the experimental findings of Thurstone* 
and of Dureay, who used similar techniques, college students place 
crimes according to their seriousness in practically the same rank or- 
der as found by Simpson for a prison population. 

In all of these studies, however, the methods used have permitted 
the acts to be placed on the scale only in terms of the stimulus differ- 
ences and have not permitted direct comparisons to be made, since 
the point at which the delinquents and non-delinquents begin to sepa- 
rate acts into “good” and “bad” forms of behavior was not deter- 
mined. It is entirely possible that, although the two groups place the 
acts in approximately the same rank order on the scale, one group 
may consider many more of the acts as “good” than the other, thus 
leading to a real difference between the attitudes of the two groups. 
It therefore becomes desirable to set up scaling techniques which make 
it possible to locate the origin of the scale for the delinquent and non- 
delinquent populations separately, and hence give a basis for making 
direct comparisons between the attitudes of the two groups. 

The purpose of this study can now be stated more precisely as 
the determination of possible differences in the attitudes of delin- 
quents and non-delinquents toward a series of social acts, when meas- 
ured from an absolute point of neutrality. Although the experiment- 
al results may be of interest in themselves, the development of new 
and practical techniques for this and similar problems is considered 
to be of major importance. 

Two techniques were used in this study. One is Horst’s exten- 
sion of the method of paired comparisons.t The method of paired 
comparisons was first suggested by Thurstone]| for determining scale 
separations between a series of stimuli having affective value and 
stating the scale values of the stimuli in terms of their distance from 
an arbitrary origin. Horst’s procedure provides a method for locat- 
ing the origin (point of neutrality) in a scale and stating the scale 
values of the stimuli in the series in terms of their distance from this 
origin. The other technique is an extension of the psychophysical 
method of successive intervals. This method was developed in its 
simplest form by Thurstone, but was first described in the literature 


*Thurstone, L. L. The method of paired comparisons for social values. 
J. abnorm. (soc.) Psychol., 1927, 20, 384-400. 

+ Durea, M. A. An experimental study of attitudes towards juvenile delin- 
wey. J. “eb Psychol., 1984, 17, 522-534. 

Tt Horst, A. P. A method for determining the absolute affective values of a 
series of stimulus oye J. educ. Psychol., 1932, 23, 418-440. 

|| Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 24, 
273-286. 











So Tf eS 


Je 


a ae ee” a a ee a ee oe! 


_ 











RUTH BISHOP 37 


by Saffir.* The method of successive intervals differs essentially from 
Case V of the Law of Comparative Judgment, on which Horst’s meth- 
od is based, in that it is not assumed that the discriminal dispersion 
is the same for all of the stimuli. 

Since the method of successive intervals, as developed by previ- 
ous investigators, gives only the scale separations of the stimuli in a 
given series, it was necessary to develop an extension of this method 
in order that the scale values be expressed in terms of their distance 
from the origin on the scale. This extension is somewhat similar in 
principle to that which Horst developed for the method of paired com- 
parisons. In addition, one change was made in the calculation of the 
scale separations. In order to maximize the consistency of the scale 
values, it seemed preferable to use the centroid of the cumulative 
sigma values between any two successive categories for any given 
stimulus, rather than the cumulative sigma values for the successive 
categories. These values are given by the following equation, which 
is a variation of one of the formulas given by Kelley;: 


Neh (1) 


z= : 
P2— Di 





where 


x is the centroid of a truncated segment of a normal curve 
which has unit standard deviation ; 
Z,and z, are the ordinates enclosing the segment at the left 


and right respectively ; 
p, and p. are the proportions of the area under the curve from 
the left to the ordinates with corresponding subscripts. 


Hence the equations for the standard deviation and for the scale 
value of a given stimulus, using the method described by Saffir, be- 


come 
ae Cri (%% Dei —74 D7) (2) 
(N. DI —— ter Dy2;) 





Oyj 


S., a S.i re Oxi Dt — Oyj Di (3) 


Ua 





where 


oy; is the standard deviation of stimulus 7 whose scale value 
is to be determined ; 


* Saffir, M. A. A comparative study of scales constructed by three psycho- 


physical methods. Psychometrika, 1937, 2, 179-191. 
+ Kelley, T. L. Statistical method. New York: Macmillan Company, 1924, 
1 


equation 55, p. 101. 





38 PSYCHOMETRIKA 


oz is the standard deviation of stimulus i whose scale value is 
already known; 
> 21; is the summation of the centroid values for the categories 
of stimulus 7 used in determining the value on the z-axis 
through which point (1) is passed; 
>i is the summation of the centroid values for the categories 
of stimulus 7 used in determining the value on the z-axis 
through which point (2) is passed; 
dy; is the summation of the centroid values for the categories 
of stimulus 7 used in determining the value on the y-axis 
through which point (1) is passed; 
>dy2; is the summation of the centroid values for the categories 
of stimulus 7 used in determining the value on the y-axis 
through which point (2) is passed; 
m, is the number of categories common to stimuli i and 7 in- 
cluded in Sai, Sys; 3 
m, is the number of categories common to stimuli 7 and 7 in- 
cluded in 522; , So; 3 
S,; is the scale value for stimulus 7 ; 
S,i is the scale value for stimulus 7. 


As in the Horst technique, the neutrality point for a series of 
stimuli, for which judgments have been obtained by the method of 
successive intervals, must be located by combining in some manner 
the subject’s judgment about the location of the sum of any two of 
the items in the series with his judgment about the location on the 
psychological continuum of each of the stimuli considered separately. 
The technique developed in this paper differs from that of Horst, in 
which all of the unknowns are solved for simultaneously, in that the 


estimate of the neutrality point and the calculation of the scale sepa-° 


rations for the individual stimuli are divided into two separate pro- 
cedures. 

The origin of a scale derived from successive interval data may 
be determined if the problem is set up so that there is an odd number 
of categories, one of the categories being defined as neutral. As in the 
Horst technique, the subjects are asked to make judgments about the 
affective value of the sum of any two individual stimuli in the series, 
making use of the descriptive categories employed in judging the in- 


oi See n(n—1) : 
dividual stimuli. Theoretically, there are se summation equa- 


tions, but as in any psychophysical technique, situations obviously 
close together and clearly some distance from the point of zero affect 
may be omitted from the series since they would yield too high or too 
low proportions to give stable scale values or accurate estimates of 
the point of neutrality. 








@s 4 = 68 © = oF 








RUTH BISHOP 39 


The summation situations are first placed on the same arbitrary 
scale as the individual stimuli, using as a unit of measurement the 
standard deviation of some arbitrarily selected individual stimulus. 
The point of neutrality may then be estimated by the equation 





We 
ps (Xn, ov, + Sz) 
eit) ae . 
K = u + S*, (4) 
where 
K is the mean estimate of the point of neutrality for the se- 


ries; 

Xn, is the centroid for the category defined as neutral for the 
summation situation L , which is a combination of individ- 
ual situations 7 and 7 ; 

o, is the standard deviation for the summation situation L ; 

S, is the arbitrary scale value for the summation situation L ; 

S,is the arbitrary scale value for the individual situation 
whose standard deviation has been used as the unit of meas- 
urement in estimating the arbitrary scale values of the 
summation situations. Obviously, if the standard deviation 
of the same situation is used as the unit of measurement 
for estimating the scale separations of both the individual 
and summation situations, this term will vanish in equation 
(4), since in arbitrary scaling procedure it is customary to 
assign a scale value of zero to the situation whose standard 
deviation has been used as the unit of measurement. 

M is the number of estimates made of the point of neutrality. 


By a simple linear transformation, the arbitrary scale values may 
now be expressed in terms of their distances on the scale from this 
estimated point of neutrality, thus: 


Sx, = SiitK, (5) 
where 


Si; is the scale value of the individual stimulus j in terms of its 
distance from the mean estimate of the point of neutrality 
of the scale; 

S,; is the arbitrary scale value for individual situation 7 ; 

K is the value given by equation (4). 

In constructing the experimental scale to be used in this inves- 
tigation of the differences in the verbal attitudes of delinquents and 
non-delinquents, a series of fifteen social acts were selected which 
seemed to be within the experience of boys of eleven years or older, 
and which seemed to show a fairly even graduation from asocial to 
prosocial acts. The acts were selected from a longer preliminary list 
on the basis of the judgments of a large number of individuals who 
are interested in social issues and who have some understanding of 





40 PSYCHOMETRIKA 


the problems included in this study. The experimental schedule is 
divided into four parts. Parts I and II, respectively, give the data for 
the difference equations used in Horst’s extension of the paired com- 
parison method and the extension of the method of successive inter- 
vals described in this paper. Parts III and IV provide the data 
for the summation equations used in the two methods. In order to 
correct for time and space errors, two forms of the scale were con- 
structed. In Form II, the order of the items in each section and pair 
is the reverse of the order in Form I. As nearly as possible, the sched- 
ule is self-administrative, and each subject is given as much time as 
he likes in which to make his responses. 

Two groups of subjects were used in this study: a group of white 
male delinquents and a comparable group of white male non-delin- 
quents or controls. A delinquent was arbitrarily defined as a person 
who was confined in a reformatory or special “school” at the time the 
schedule was given. A control was arbitrarily defined as a person who 
was attending a public or private school, or one who was a member of 
some group in a local community center, at the time the schedule was 
given, and who did not have a court delinquency record as far as 
could be determined by the teachers and officials in the schools and 
community centers. All of the subjects were selected from the Chica- 
go area, and had sixth grade or better reading achievement level. 

Since attitudes are generally considered to be a function of the 
total experiential situation of an individual, it seemed desirable to 
attempt to select the respective delinquent and control populations in 
such a manner that they would be fairly comparable with respect to 
their distribution in certain variables. A difference between the pro- 
portions of the two groups in any category in which a comparison 
was being made was regarded as significantly greater than zero when 


PP = 3.00 .* No significant differences were found in the 


7 ps - pe) 
experimental proportions of delinquents and controls who (1) had 


average or better ratings on a standard intelligence test, (2) had 
eighth grade or better reading achievement test scores, (3) belonged 
to major nationality groups, (4) came from high or low delinquency 
areas as defined by Shaw and McKay.}+ There was, however, a sig- 
nificant difference between the two groups as to age and education, 
the delinquents being considerably older and having less education 
than the controls. Nevertheless, this was not considered serious since 


Pp a J. P. Psychometric methods. New York: McGraw-Hill, 1936. 
. 59-63. 

+ Shaw, Clifford & McKay, H. D. Juvenile delinquency and urban areas. 
rng monograph prepared at the Institute for Juvenile Research, Chi- 
cago, 1938. 








th 


re 
en 


|) 


lRnnznmagor vnOesrmt WRI 


—- 








RUTH BISHOP Al 


there were only a few items, and no individual stimuli, which showed 
a significant difference in the proportions of subjects making a given 
response in either the delinquent or control groups when the differ- 
ences in these two selective factors were as large as possible. 

Since the responses made in the two forms were found to be high- 
ly consistent, and since both the delinquent and control groups seemed 
to be fairly homogeneous, it was decided to consider all the 251 delin- 
quents in one group and all the 318 controls in another for scaling 
purposes. * 

The scale values for the delinquent and control groups, as deter- 
mined by the extensions of the psychophysical methods of paired com- 
parisons and successive intervals previously described in this paper, 
are shown in Table I. The relationship between the scale values for 
these two groups is shown graphically in Figures 1 and 2. 

TABLE I 
Comparison of Scale Values 


Absolute Scale Values 























Horst Extension of 
Stimulus Technique Successive Intervals 
Del. Cont. Del. Cont. 
Save a drowning child ........................ 4.04 3.66 2.48 8.14 
Put out a fire before it becomes 
NE ics eee ae 8.06 2.86 2.18 2.61 
Help an old lady across the street.... 2.63 2.38 1.95 2.18 
Make good grades in school ................ 2.38 2.38 1.97 227 
Go to church regularty «................::...... 1.99 1.84 2.23 2.19 
Tell stranger how to find a certain 
| gem RCE CPI ees CR oe en 1.96 1.58 1.43 1.49 
Oe: | ec re ee eae 67 .89 1.07 1.05 
ep 6 SR eee 64 92 .90 86 
Walk four abreast on the sidewalk.... — .83 — AT — .53 — 33 
mop 8 (2c for’ B TIGE: 3.320 —1.40 —1.63 — .89 —1.54 
eR Cle Ce 2) a ee eee —1.42 —1.48 — .88 —1.27 
Kick a dog asleep on the sidewalk .... —2.35 —2.47 —1.53 —1.92 
Steal a banana from a fruit stand.... —2.43 —2.12 —1.35 —1.59 
Steal a bicycle .......... —3.68 —3.43 —2.55 —3.04 


RE NO sites cacccot cath amnctene —4.33 —3.62 —2.99 —3.17 








In order to make a direct comparison between the estimates of 
the point of neutrality obtained for the delinquent and control groups, 
it is necessary to use a method based on all scale values in the two ex- 


* The author wishes to thank Dr. A. W. Brown and his students for admin- 
istering the schedules to the adult delinquent groups. 





42 PSYCHOMETRIKA 




















40 
DEL. 
29) 7 ‘ 
° 
CONTROLS 
-40 -29 () 20 a0 
-20) 
-40) 
e 
Fic. 1. — Comparison of the uncorrected scale values for delinquents and 


controls obtained by the extension of the method of paired comparison. 









CONTROLS 
20 49 





49 














Fic. 2. — Comparison of the uncorrected scale values for delinquents and 
controls obtained by the extension of the method of successive intervals. 


perimental series which are being compared. In the method described 
below, it is assumed that both the delinquent and control groups have 
used the same psychological unit of measurement in making their 
judgments. Hence, the dispersions of the scale values in the two ex- 
perimental populations may be equated simply by multiplying each 
scale value in the group having the smaller dispersion by the ratio of 
the dispersion of the second group to this one. The difference in the 
points of neutrality for the two groups may then be defined as the 
residual resulting from the minimization of the sums of the squares 
of the discrepancies in the corrected scale values of the respective 








sti 


Si 


f 








RUTH BISHOP 43 


stimuli for the two experimental populations. The equation for this 
difference is developed in the conventional manner. Let 
S.; be the scale value for stimulus 7 determined on a control 
population ; 
Sa; be the scale value for stimulus 7 determined on a delinquent 
population ; 
a be the residual which results when the sums of the squares 
of the discrepancies in the scale values have been made a 
minimum; 
T be the number of stimuli in the experimental series. 


Then the equation to be minimized is 
T 
V=2 (S.j — Saji + 4)’. (6) 
j=1 
Expanding this equation, we have 


V= EH +E Sy + TH —2SS. Su + 2a3Sj— 2038). 


j=1 





(7) 
Differentiating this expression with respect to a, we have 
dv 
rl =2Ta+238i—23S8%,. (8) 
Setting this equation equal to zero, and solving for a, we have 
> T 
= Sa; 2 Soi 
Li j=1 ee j=1 
anal (9) 
This expression also may be written 
a= Xa —_— Xe ? (10) 


where 


ais the estimated difference in the points of neutrality on the 
two scales being compared; 

22 is the mean of all the scale values in the experimental series 

as determined for a delinquent group; 

x, is the mean of all the scale values in the experimental series 

as determined for a control group. 

For this particular sample, the dispersions of the scale values 
for the delinquent and control groups were approximately the same, 
the ratio between the greater and smaller dispersions of scale values 
being 1.15 for each of the methods. Neither was there a significant 








44 PSYCHOMETRIKA 


difference in the point of neutrality for the two experimental groups, 
a being less than 0.1 for each of the methods. 

When comparisons are made between the scale values for the 
two experimental groups, as determined by either of the scaling meth- 
ods, the following conclusions may be drawn: 


(1) There is a linear relationship between the scale values of the 
delinquent and control groups. From this relationship, it 
may be deduced that both of the groups were able to dif- b 
ferentiate equally well between the items in the experiment- | 
al scale. fe 























Fic. 8. — Comparison of the corrected scale values for delinquents obtained 
by the extension of the methods of successive intervals and paired comparison. 





an 
Sl. ° 


20 sf re 





20 40 

















Fic. 4. — Comparison of the corrected scale values for controls obtained by 
the extension of the methods of successive intervals and paired comparison. 











RUTH BISHOP 45 


(2) The foregoing conclusion is further supported by the fact 
that the dispersion of the individual stimuli along the psy- 
chological continuum is essentially the same for the two 
groups. 

(3) The delinquent and control groups locate the point of neu- 
trality at approximately the same place in the experimental 
series. 


From Figures 3 and 4, it may be seen that the scale values given 
by the two techniques used in this study have a linear relationship 
for both the delinquent and control populations. The fit is quite close 
except for the item regarding church attendance. This item gave less 
consistent results than any of the other stimuli in the experimental 
series. There are at least three possible explanations for this dis- 
crepancy. One possibility is that the item cannot be represented 
adequately in one dimension*; another is that the subjects react to 
the item differentially according to the judgment they are asked to 
make; while a third possibility is that the discrepancy is simply due 
to a sampling error. While the present data do not permit a complete 
answer to this question, the third explanation possibly may be cor- 
rect since the item is found to satisfy a linear relationship between 
delinquents and controls within the same technique, and the scale 
values given by the two techniques for the control population are 
considerably more consistent than for the delinquent population. 
There is a possibility, however, that the result is due in part to the 
different kinds of judgments which the subjects were asked to make 
in the two techniques. 

In conclusion, it may be pointed out that although no significant 
differences were demonstrated in the verbal attitudes of the two ex- 
perimental groups used in this study, the techniques would seem to 
be of value in attacking similar problems. Horst’s technique and the 
extension of the method of successive intervals developed in this pa- 
per give essentially the same results. The latter method, however, 
requires considerably less time both in the collection of the experi- 
mental data and in the calculation of the scale values, and, for this 
reason, it is probably preferable for general use. 


* Richardson, M. W. Multidimensional psychophysics. Psychol. Bull., 1988, 
35, 659-660. 





4 


ANNOUNCEMENT 


The Psychological Test Supplement to Psy- 
chometric Monograph No. 1, now out of print, has 
been reissued through Auxiliary Publication and 
may be obtained from the American Documenta- 
tion Institute, in care of Offices of Science Service, 
2101 Constitution Avenue, Washington, D. C., by 
ordering Document 1317, remitting $3.24 for copy 
in microfilm. This copy, enlarged full size, may be 
used on reading machines now widely available. 








cer 
fe 
fol 
cif 


in 
cal 
but 








PSYCHOMETRIKA—VOL. 5, NO. 1 
MARCH, 1940 


FACTORIAL INVARIANCE AND SIGNIFICANCE 


GALE YOUNG AND A. S. HOUSEHOLDER 
The University of Chicago 


It is shown that invariance requirements remove the indeter- 
minacy in factor determination and lead to an integration of fac- 
torial studies with promise of considerable reduction in computa- 
tional labor. The selection of significant primary factors is dis- 
= with special reference to Thurstone’s simple structure cri- 

rion. 


The naive notion of a unidimensional scaling-of individuals as 
adequate to characterize their intellectual capacities is an old one, 
being implicit in the very word “intelligence,” and being somewhat 
refined in the theory of the I.Q. and in the practice of classroom grad- 
ing by a single scale. But while being more or less useful in a rough 
way, the very efforts to make the notion more precise succeed most 
abundantly in revealing its inherent difficulties. 

Whatever might be meant by the single ability so postulated, it 
would surely be maintained that the more of it a person has the bet- 
ter his performance in any intellectual task will be. But then it is 
readily seen that performance, as measured by test scores, cannot be 
described as a function of a single quantity possessed in different 
amounts by different individuals, since two persons may have the 
same score in one test and yet differ widely in another. 

Historically, factor analysis had its beginning in Spearman’s at- 
tempt to salvage this idea of a single general intellectual factor in 
human behavior (10 and earlier). With a single ability unable to ac- 
count for observed intellectual behavior, one might suppose that the 
next step would have been to try the notion of several general abil- 
ities possessed in varying amounts by different persons and entering 
in different degrees into different tests. However, Spearman pro- 
ceeded differently: he used the idea of many abilities possessed in dif- 
ferent amounts by different persons, but supposed that each test called 
for but two of these; a factor g involved in all tests and a factor spe- 
cific to each test and not involved in any other. 

This is at once seen to be untenable, for if tests can agree only 
in their g factor, then two very similar tests would measure practi- 
cally pure g. In like manner, two other tests similar to each other, 
but not to the first pair, would measure a different g , and the theory 


a ee 








48 PSYCHOMETRIKA 


immediately contradicts itself. The same difficulty is found in the 
later Kelley-Hotelling theory in which the first factor tends to swing 
into line with any group of closely similar tests (7). 

Later on, Spearman tried to escape this difficulty by introducing 
additional “group” factors, which could be involved in a number of 
tests. However, since some of these group factors would, depending 
on the test battery, run through all the tests, we have here the im- 
plicit letter, if not the spirit, of multiple general factor analysis. 


I 
Now if the still older “faculty” psychology had not fallen into 
such metaphysical disfavor, Spearman might have begun by trying 
the notion of several general abilities as mentioned above, and the 
concept of “specifics” might not have arisen. Retaining his concep- 
tion of the factor loadings of a test and of an individual as entering 
bilinearly into the resulting score, he might have attempted to express 

the score of individual 7 in test 7 as 


Si; = 2 bin Pus ’ (1) 


where 7 is the (unknown) number of general factors and the quan- 
tities ¢ and » are the factor loadings of the test and individual re- 
spectively. 

This is the starting point of the multiple factor systems of Kel- 
ley and Thurstone (12), except that Thurstone introduces specific 
factors as well. In matrix notation (1) may be written as 


S=TP, (2) 


where the test matrix T has r columns and the population matrix P 
has r rows. This can hold exactly only if r is as great as the rank of 
S , which in general would result in there being more numbers in the 
two matrices T and P than in the original score matrix S. It may be 
possible, however, depending upon the “clustering” of the test vec- 
tors which comprise the rows of S (15), to approximate S closely by 
a lower rank matrix S, which would then replace S in (2), and this 
would materially reduce the number of factor loadings required. 

In explicit form this idea of matrix approximation is due to Ec- 
kart (2). It is found more or less implicitly in previous writings of 
Hotelling (4) and of Kelley (7), though it is there confused with ques- 
tions of “significant axes.” In a rougher form, but free from the just- 
mentioned confusion, it is found in Thurstone’s centroid factoring 
method. Wilson (13) had just missed the idea when he considered 
varying the observed test vectors to make them satisfy some desired 
condition. However he stopped with noting that this could be done in 








cil 
ci] 
th 


los 


it 








GALE YOUNG AND A. S. HOUSEHOLDER 49 


infinitely many ways, without taking the further step of selecting the 
one which required the least change in the data. 

Among methods for calculating the approximating matrix may 
be mentioned those due to Hotelling, (4, 5) and to Horst (3, 6). 


II 
Supposing a matrix S, of rank 7 to have been obtained by ap- 
proximating to S by any suitable method of approximation, the next 
question is as to the interpretation implied by (2) when S, replaces 
S ; namely, the determination of T and P in 


TP=S,. (3) 


At this point controversy arises, because even when S, is unique, T 
and P are not. In fact if T, and P, constitute a particular solution of 
(3) (and one always exists), then the general solution is given by 


T=T,Q 
(4) 
P=Q'P,, 


where Q is an arbitrary non-singular r X 7 matrix. Thus the factor 
loadings are not determined uniquely by the approximate score ma- 
trix S,. 

There have been two main opinions as to what to do about this. 
One of them arises as follows: the least squares method of approxi- 
mating to S is geometrically equivalent to rotating S (by suitable 
orthogonal matrices multiplying S on the right and left) to its prin- 
cipal axes, discarding the components along all but the 7 longest prin- 
cipal axes, and then rotating back again (by means of the inverses of 
the orthogonal matrices) (14). There is thus a particular set of axes 
appearing in the process, and Kelley and Hotelling attribute psycho- 
logical significance to these. 

On the other hand Thurstone maintains that the indeterminacy 
in (4) is to be fixed by appeal to further considerations. One such 
consideration is found in an invariance requirement, implied in his 
rejection of Hotelling’s solution on the ground that it gives, as men- 
tioned earlier, different factor loadings for a test depending upon 
what other tests it is incorporated with in the test battery (12, p. 
120). It is our thesis, to be demonstrated presently, that the indeter- 
minacy is fully taken care of by invariance requirements. 


III 
Factor analysis has been developed mainly by psychologists, but 
it is not a psychological theory. Rather it is a sort of generalized 





50 PSYCHOMETRIKA 


curve-fitting process, and as such may be of service in numerous fields. 
It has been employed, for example, in the study of business trends 
and even in pathology. 

It can be illustrated therefore in other subjects, and for purpose 
of discussing some of its fundamental problems it is most clarifying 
to consider them in terms of some more physical science. We there- 
fore consider the following example from physics. If S;; is the spring 
balance weight of body i at location 7, then theoretically we should 
have 

Sis = Mi 9;, (5) 


where m is the body mass and g is the strength of the gravitational 
field at the place in question. But this implies that S is of rank one, 
which it will not be because of the inevitable presence of errors. Hence 
we let A be the best rank one approximation to S and write instead 
of (5) 

Mi Qi = Ui; (6) 
or, in matrix notation, 

MG=A (7) 


where M has one column and G has one row. 

But this does not determine the m and g numbers uniquely, for 
»there is an arbitrary scale factor left over: namely, if q is any non- 
zero number, then (M, q) and (q"* G) give the general solution of 
(7), providing M, and G, constitute a solution. 

As is well known, this arbitrariness is fixed by arbitrarily assign- 
ing the mass number of a selected standard body. All the m and g 
numbers are then fixed in terms of this standard, and any body (or 
location) carried over into another such experiment serves to fix the 
numbers there. Thus eventually all masses are described in terms of 
the standard mass kept at Paris. 

But in such a procedure it might be found necessary to add more 
factors. If, for example, the bodies were electrically charged and the 
locations had vertical electric fields, then (6) would be replaced by 

Mi Gj + Ci eC; = Ai; ; (8) 
where now A is the best rank two approximation to S, and c and e 
measure the charges and electric field strengths, respectively. M now 
has two columns, G has two rows, and there is now an arbitrary 2 X 2 
matrix Q in the general solution given by (M,Q) and (Q-'G,). This 
would be fixed by assigning m and c numbers to two bodies, and carry- 
ing two bodies (or locations) over into the next experiment, etc. The 
only restriction is that the standard bodies be independent and that 
the 2 X 2 array of numbers assigned them should have a non-zero de- 








GALE YOUNG AND A. S. HOUSEHOLDER 51 


terminant. Any pair of bodies carried between experiments must also 
be independent, i.e., their 2 X 2 minor in M must be non-singular. 


IV 


The method of determinig Q in (4) is now clear. A set of r inde- 
pendent tests is to be assigned any non-singular r X r array of factor 
loadings once and for all, and then future testing is to be linked to 
this standard by always carrying a set of 7 independent tests (or in- 
dividuals) from each experiment to the next after the manner sug- 
gested by Mosier (8). Thus the question of factor indeterminacy is 
given a neat and simple solution by explicit appeal to invariance re- 
quirements. In addition, this solution makes possible a considerable 
reduction in the computational labor of factor analysis: 

In the first place, it makes it possible to tie together the results 
of different investigators and thus to acquire a much larger body of 
common data with the same amount of work. The situation at pres- 
ent resembles a number of experimenters busily engaged in weighing 
objects but never passing one around between them to standardize 
their scales so that their results may be reduced to the same units. 

In the second place, having seen how uniqueness is to be obtained 
in principle, we can forget about what loadings are to be assigned the 
standard tests until the time comes to isolate and interpret the fac- 
tors. In other words, the “uniqueness-invariance” part of the factor 
problem is to be handled in practice by simply carrying tests from 
battery to battery; any actual factoring as in (3) may be deferred 
until later. 

Finally, in the third place, the labor of analyzing a test battery 
may be greatly reduced by having a sub-set of the tests to start from. 
Let a number, say h, of strongly independent tests be available; i.e., 
tests which can not be adequately represented by fewer than h fac- 
tors. These may have been obtained from previous experiments, or 
in any other suitable manner. Let these be incorporated as the first 
h tests of a larger battery, and denote the resulting score matrix 
when the battery is given to a (fairly large) population of individuals 


by 


s=(5'): (9) 


Here S, is the matrix of scores on the first h tests, S. the scores on the 
additional tests. If the second set of tests involved no factors not 
present in the first there would be, apart from errors, a matrix K of 
suitable dimensions such that 





52 PSYCHOMETRIKA 


S.=KS,. (10) ° 


The scores on a test may be regarded as giving the components 
of a vector in a space whose dimensionality is equal to the number of 
individuals. Then (10) describes a situation in which the S, vectors 
lie in the subspace defined by the S, vectors. But because of the pres- 
ence of errors or additional factors this will not be the case; i.e., the 
S, tests will have components outside the S, subspace. The compo- 
nents which stick out from the S, subspace may be found rather sim- 
ply by considering that the matrix K which makes (10) most nearly 
true in a least squares sense, i.c., which minimizes the sum of squares 
of the elements of 

R=S.—KS,, (11) 
is given by 
K = 8, 9',(5, 3.) . (12) 


This process in fact projects the S, vectors perpendicularly upon the 
subspace of the S, vectors; so that KS, gives the components of the S, 
vectors in that subspace, while R gives the components perpendicular 
to that subspace. Once obtained, the matrix S’,(S, S’,)-1S, is avail- 
able for projecting any second set of tests given to the same popu- 
lation. 

If an S, vector has a perpendicular component whose length is 
much less than that of the component parallel to the S, subspace, then 
that test does not significantly involve any factors other than those in 
the S, tests. On the other hand, if the perpendicular component is 
sizeable, then the presence of additional factors is indicated. 

If the R vectors are not negligible, or do not have such a random 
distribution (15) as might suggest that they represent mostly error 
components, pick a number of strongly independent ones from among 
those of significant length and project the rest of the R vectors onto 
these, obtaining a second residual matrix R, with fewer rows than R. 
How to pick such a set of independent vectors will not be considered 
in detail here; the procedure will be sufficiently clear in practice. For 
two vectors it is simply a matter of calculating the angle between 
them. Projecting a third upon the first two will show if it is indepen- 
dent of them, and so on. 

This process may be repeated as many times as necessary. At 
any stage the residual matrix is the same as if a single projection had 
been made into the subspace defined by all the independent tests em- 
ployed in all stages, so that the number of independent tests se- 
lected in any step is a matter of convenience. The smaller this num- 
ber is, the smaller is the matrix whose inverse has to be calculated as 
in (12). 








Rh 


0 nm @ a ed & we 














GALE YOUNG AND A. S. HOUSEHOLDER 53 


When this procedure has been carried out, the result is a number, 
say h,, of S, tests which adequately represent all the additional fac- 
tors involved in the S, tests; and these together with the original S, 
tests define a (h + h,)-dimensional subspace which practically con- 
tains all the test vectors. Thus, as the factor study is extended over a 
larger field, any new factors which appear are automatically picked up 
and added to the original set. In particular, any factors specific in one 
test battery will be caught up in this manner into a common factor 
as more tests are added, so that it is not necessary to allow for them 
in the formulation of the process. 


Vv 

Of course it can not be said in advance how adequate a system 
based on formula (1) will be when applied to experimental data, but 
the above formulation provides a consistent and workable method of 
procedure which allows the basic formula (1) to be systematically 
tested. Replacing (1) by other functions of the factor loadings would 
lead to non-linear systems, mathematically more difficult to handle. 
This is intimately connected with scoring methods and score trans- 
formations, since many non-linear functions can be reduced to linear 
ones by suitable transformations on the variables. This is beyond the 
scope of the present article. 

A full statement of the invariance principle is that any consistent 
system of factor analysis must yield the same values (apart from er- 
rors) for the factor loadings of individuals or tests, regardless of the 
combinations in which they are presented. In other words, the load- 
ings found for a test should not depend upon what other tests are in 
the battery, and the loadings for an individual should not depend upon 
what other individuals are in the population. 

These considerations argue against the use of specific factors, 
normal scores, and such asymmetrical procedures as give rise to a dis- 
tinction between “direct” and “inverse” factor problems (11, 1). 


VI 

We come now to what is perhaps the most interesting part of the 
factor problem, namely, the determination of significant factors. Sup- 
pose that some field has been studied in the manner described above 
and that a set of r independent tests has been found which represents 
an adequate basis for that field. Up to now, the 7* numbers to be as- 
signed to these standard tests as representing their loadings in the 
various factors have been left arbitrary; we have been little con- 
cerned with factor loadings except to note how the indeterminacy in 
obtaining them is to be handled. 








54 PSYCHOMETRIKA 


Assigning these r? numbers expresses the 7 tests (and hence all 
tests dependent on them) in terms of r factors. Of these 7? numbers, 
r may be regarded as scale units, and the remaining r(r — 1) as the 
direction cosines of r axes in an r-dimensional space. The matter of 
scale units is familiar, but how to determine “meaningful” directions 
for the primary factor axes is a question of more difficulty. This ques- 
tion has been clearly pointed out by Thurstone, and his suggestion of 
looking for “simple structure” is about the only attempt at an answer 
which has been made. 


VII 

In discussing this matter we return to the physical example of 
(8). In this there were four numbers to be assigned to two bodies. 
It might be urged that these numbers should be chosen so that those 
in one column are proportional to the masses of the bodies, and those 
in the other column to the charges. The two factors would then relate 
to mass and charge, rather than to some hybrid quantities which are 
linear combinations of these. 

However, it is clear that this choice could not be made from a 


knowledge of the A matrix only. The concepts of mass and charge - 


are based on many kinds of experiments and experience other than 
that included in the weighing experiment described above. 

Nevertheless the experimenter might make some progress by sup- 
posing that the “true” factors would tend to occur more or less sepa- 
rated in the experiment; i.e., that there would be a number of bodies 
with mass only and a number with charge only. If this were the case, 
then plotting one column of M against the other for any solution of 
(7) would show two lines through the origin along which a number 
of points tend to lie. By transforming to these lines as axes the bodies 
would be described in terms of “separated” factors. 

It is thus clearly seen what sort of considerations are involved in 
Thurstone’s simple structure criterion. It can be further seen why it 
is that his results accord well with judgment and experience, for these 
are employed in selecting and specializing the tests used. This would 
encourage factor separation which, when found, should tend to cor- 
roborate the experimenter’s judgment. 


VIII 
However, there are still other considerations. It should be re- 
membered that in psychology one is not seeking for a unique set of 
factors that have been already isolated, as was the case in the physi- 
cal example cited above. Suppose that the concepts of mass and 
charge had not yet appeared, and that the experimenter was faced 
with the problem of actually selecting the most useful axes, instead 








= 4 


2,0 so 2weeran od’ oO SS Fe = 4 


— 


QoQ =— = A ct eo 8 ee > Da ©. 


6 © = 








GALE YOUNG AND A. S. HOUSEHOLDER 55 


of merely trying to decide which axes corresponded to already known 
concepts. The keynote of scientific usefulness is simplicity, and thus 
he would select axes on the basis of certain resulting simplicities. 

Now separation of the factors does provide a more simple descrip- 
tion of the bodies, and the experimenter would certainly note this if it 
were present. But this is only one point, and in itself not the most 
important one. He would be more concerned with studying how the 
factor loadings of the bodies and locations are related to other experi- 
ments. He would find, for example, that if he chose the factors in a 
certain way they would have simple laws of variation. Thus m cannot 
be changed without physically altering the body by cutting it up or 
adding to it, while c can be varied by a simple glass rod and cloth rub- 
bing procedure which leaves the mass and appearance unaltered. Or 
the g; are nearly constant from place to place, while the e; may vary 
widely. The e measure of a place can be changed by simply charging 
a condenser, while to change the g value would require moving enor- 
mous masses of material. 

In this way the experimenter would come to decide on a pair of 
factors from consideration of how they behave in other experiments. 
If this choice should agree with that obtained from factor separation 
in the original experiment, then the latter would be confirmed; if not 
it would probably be superseded. 

Thus it is also for psychological factors: the simple structure 
criterion can suggest certain factor choices, but the ultimate decision 
depends upon such things as the study of the growth laws of factor 
loadings in the population matrix, the effects of operations and acci- 
dents, clinical and abnormal cases, the relation of factor loadings to 
brain wave patterns as suggested by Gerard, neural mechanisms as 
suggested by Rashevsky, Thurstone’s genetic interpretation of unitary 
factors, laws of factor inheritance, etc. 

In the Spearman theory it was thought that g and the specific 
factors had different laws of behaviour; e.g., that g increased with 
age up to about 16 years and was independent of training, whereas 
the specifics were increased by training (9). As has been seen above, 
this type of consideration, to which physiology and other fields can 
contribute quite as much as psychology, is fundamental and much 
more valuable than assumptions as to factor loadings being uncorre- 
lated over the population, etc., which have been more stressed in the 
development of the subject. 

The authors wish to express their thanks to Professor Carl Ec- 
kart, who suggested some of the items connected with the physical ex- 
amples, and to Professors Harold Gulliksen and M. W. Richardson, 
all of the University of Chicago. 





56 


Sy 


10. 


11. 


12. 


14, 


15. 





PSYCHOMETRIKA 


REFERENCES 


Burt, Cyril. Correlations between persons. Brit. J. Psychol., 1937, 28, 59-95. 
Eckart, Carl and Young, Gale. The approximation of one matrix by an- 
other of lower rank. Psychometrika, 1936, 1, 211-218. 

Horst, Paul. A method of factor analysis by means of which all coordi- 
nates of the factor matrix are given simultaneously. Psychometrika, 1937, 
2, 225-286. 

Hotelling, Harold. Analysis of a complex of statistical variables into prin- 
cipal components. J. educ. Phychol., 1938, 24, 417-441, 498-520. 

Hotelling, Harold. Simplified calculation of principal components. Psycho- 
metrika, 1936, March, 1, 27-35. 

Householder, A. S. and Young, Gale. Matrix approximation and latent roots. 
Amer. Math. Monthly, 1938, 45, 165-171. 

Kelley, Truman L. The essential traits of mental life. Cambridge: Harvard 
University Press, 1935. 

Mosier, C. I. Determining a simple structure when loadings for certain tests 
are known. Psychometrika, 1939, 4, 149-162. 

Fiaggio, H. T. H. Mathematics and psychology. Math. Gazette, 19338, 17, 
36-42. 

Spearman, Charles. The abilities of man. London: Macmillan, 1927. 
Stephenson, William. The inverted factor technique. Brit. J. Psychol., 1936, 
26, 344-361. 

Thurstone, L. L. The vectors of mind. Chicago: Uinv. Chicago Press, 1935. 
Wilson, E. B. On overlap. Proc. nat. Acad. Sci., Wash., 1938, 19, 1039-1044. 
Young, Gale. Matrix approximation and subspace fitting. Psychometrika, 
1937, 2, 21-25. 

Young, Gale. Factor analysis and the index of clustering. Psychometrika, 
1939, 4, 201-208. 














PSYCHOMETRICA—VOL. 5, NO. 1 
MARCH, 1940 


METHODS OF ITEM VALIDATION AND ABACS 
FOR ITEM-TEST CORRELATION AND CRITICAL RATIO 
OF UPPER-LOWER DIFFERENCE 


CHARLES I. MOSIER AND JOHN V. MCQUITTY 
Board of Examiners, University of Florida 


It is shown that by making the assumption that the knowledge 
of the test-item and the knowledge of the entire test are both dis- 
tributed normally, the correlation coefficient between any item and 
the entire test can be expressed as a function solely of two propor- 
tions — the percentage of a high-scoring group passing the item and 
the percentage of a low-scoring group passing the item. This func- 
tion is expressed graphically as a family of curves for each of two 
conditions — where the high-scoring and low-scoring groups are 
samples of the highest and the lowest quarters respectively, and 
where they are samples from the upper and lower halves. It is 
shown, moreover, that two other common measures of item valid- 
ity, the upper-lower difference and the critical ratio of the upper- 
lower difference, may be drawn on the same coordinate axes. 


Present-day techniques of test construction make extensive use 
of the methods of item-analysis and the criterion of internal consis- 
tency. Long and Sandiford (5) have summarized the methods of de- 
termining item validity in current use and several writers, among 
them Lentz, Hirschstein and Finch (4), Swineford (10) and Pintner 
and Forlano (6) have compared the relative values of certain of these 
methods. Adkins (1) has shown that all of these methods, excluding 
those which depend upon item intercorrelations, fall into three classes 
—measures of the item-test correlation coefficient, measures of the 
regression coefficient of test score on item, and measures of the re- 
gression coefficient of item on test score. She concludes that the corre- 
lation coefficient is the superior method. The work of Richardson (8) 
relating the item-test coefficient to factorial analysis of the matrix of 
inter-item correlations, and of Richardson and Kuder (9) relating it 
to test reliability in a unique and unequivocal fashion make this meas- 
ure of increasing importance. 

In view of the increasing importance of the item-test correla- 
tion coefficient it becomes more desirable to have a method of com- 
putation which shall be less tedious and time-consuming than those 
in common use. The present paper presents an abac which makes 
possible graphical computation from very simple data. 


eS 








58 PSYCHOMETRIKA 


The difference between the proportion passing the item in the 
highest and lowest quarters on the basis of total test score is a vari- 
ant of several methods widely used. Both the raw difference between 
proportions and the critical ratio of the difference are used. For rea- 
sons to be considered later the use of quarters has certain practical 
advantages. 

It will be convenient to summarize here the notation to be em- 
ployed throughout the remainder of this discussion. 


Tit OY Tz, - correlation between an item and total score on the test. 


N ----- number of cases in a quarter. 

Py----- proportion of cases in the upper quarter of a test who 
pass a given item. 

P,----- proportion of cases from the lowest quarter of a test 
who pass a given item. 

ULD - - - difference between proportion of passes in upper and 
lower quarters. ULD = P, —P.,. 

C.R, - -- - critical ratio of ULD, the difference between the two 
proportions divided by the standard error of the dif- 
ference. 

D----- difficulty of an item, measured by the proportion of 
the total group who pass that item. 

d------ difficulty of the item in standard score corresponding 
to the proportion of passes D. 

a----- - standard-score-equivalent of the upper division point 
of total test score, e.g., the upper quartile. 

b------ standard-score-equivalent of the lower division point 


of total test score, e.g., the lower quartile. 


The present study makes one assumption — that the two vari- 
ables, knowledge of the item and total score on the test, are normally 
distributed—and requires but two data, P, and P, , the proportions 
from the upper and lower quarters, respectively, of the total test score 
distribution who pass the item. The basic assumption, that of nor- 
mality of distribution, may be made either as an assumption, or, with 
more justification, as a definition of the units of measurement. This 
definition of units to project a normal distribution is assumed in all 
that follows. 











CHARLES I. MOSIER AND JOHN V. McQUITTY 59 


(The discussion from this point to the presentation of the abac 
in Figure 1 is a technical justification of its construction and not 
essential to its use.) 

If both variables are normally distributed, with unit standard 
deviation and unit area, then for constant 7 the bi-variate frequency 
surface is uniquely determined. If we designate by x the total test 
score and by y the item variable, and if d is the value of y correspond- 
ing to the proportion of the total group “passing” the item, then 


ig S2Z(x, Y; r) dy da =" Pro . (1) 


where Z (x, y, 7) is the normal bi-variate probability surface and P,. 
is the proportion of those whose x scores are less than the values b, 
who pass the item. Similarly, 


S° f° 2 (a, y, r) dy dx = Pra , (2) 


where P,,,, is the proportion of those whose x scores are greater than 
the value a, who pass the item. If we consider those cases in the up- 
per and lower quarters, a = +.6745, b = —.6745 and Pu, P:» be- 
come P, and P, , respectively. If we subtract equation (1) from equa- 
tion (2) for these conditions, we have 


P, — P, ae Li Si? 2 (x,y, 7) dy dx 
ie i i Z(u,y,7r) dydx = F(r), (3) 


where F is a function of 7,, alone. For constant 7,, and constant o, 
(a function of item-difficulty), then, we may write: 


P,, — P; = const. (4) 


It would be possible, of course, to substitute values of 7, and d, 
the value of y corresponding to the point between “pass” and “fail,” 
in equation (3) and solve for the value of P, — P;, or ULD. These 
could then be used to prepare a family of curves showing 7,, for val- 
ues of ULD and D, the percentage difficulty of the item. 

Fortunately, it is not necessary actually to resort to this labor. 
Chesire, Saffir and Thurstone in their computing diagrams (2) have 
prepared the normal bi-variate surfaces for various values of r and 
summarized the proportions in a series of charts. For the division of 
the test score at the quartile, the graph designated by Chesire et al. as 
“a = .25” (their notation) gives the proportion of the upper or lower 
quarter who will pass an item of any given difficulty with any particu- 
lar 7,,. Using this chart for items of difficulties of .10, .20,..., .90, 
the proportion of the total group in the upper and lower quarters 





60 PSYCHOMETRIKA 


were tabled for values of 7 of .95, .90, .80,...,.10 and .00. A sample 
of the table is given below. 


TABLE I 
Proportions of the Total Group Passing an Item of Difficulty d 
if the Item-Test Correlation is r 











Difficulty of item Values of correlation coefficient ‘ 
(Chesire’s value 0) -70 | .60 -50 
| Si nininsa | * “ae ere: 
-10 | 075 | 000  .065 | .002 | .053 | .004 
.20 | 180 | 008 | .115 | 008 | .101 | .018 
lin “i dee 8 ae, ae fe i eee . 





R -- Percentage of Lowest Quarter Passing the Item 


6 10 16 20 26 $0 35 45 50 65 60 65 70 75 80 85 90 96 


Py -- Percentage of Highest Quarter Passing the Item 





FIGURE 1 
Abac for Item-Test Correlation from Percentage of Highest 
and Lowest Quarters Passing the Item 








a 


ja i 








CHARLES I. MOSIER AND JOHN V. McQUITTY 61 


The value p, is Chesire’s value c for the intersection of the curve 
for r equal to, e.g., .60 with the ordinate b equal to, e.g., .20, and 
gives the proportion of the total group who passed the item and were 
in the upper quarter on total score. Since p, is the proportion of the 
total group and P, is the proportion of the upper quarter, it follows 
that P, = 4p,. The value p; , giving the corresponding proportion of 
the total group for the lower quarter, is found by taking the value of 
Chesire’s c corresponding to the given item-difficulty and item-r. The 
values of P,, and P; were obtained from the complete entries of Table 
I. 

Figure 1* represents the relationship between values of r, Py 
and P,. The figure was constructed by using P; as abscissa and P, 
as ordinate and plotting all the points for a fixed value of 7;;. The 
points for a fixed value of 7;; were then joined by a smooth curve. 

To use the abac, select the upper quarter on the basis of total 
score on the test (we have found it convenient to select a random 
sample of 50 cases from each of the extreme quarters) and determine 
for the item 7 the proportion of the sample passing the item. This is 
the value of P,. Similarly determine the value P, , the proportion 
passing the item in a sample from the lowest quarter. Plot the point 
P, , P,, and the value of 7;, may then be read by interpolation between 
the two correlation lines adjacent to the point. While negative values 
of 7;, are not graphed, these may be found by reversing P; and P, and 
reading +7it. 

Obviously abacs similar to this might be prepared to give the 
item-test correlation on the basis of any desired symmetrical division 
of the criterion group by varying the value given to Chesire’s con- 
stant, a. Such an abac has been prep&red for division at the median 
and is given in Figure 2. Here P, and P; represent the proportion of 
the upper and lower halves who pass the item. 

Since the distribution of total scores is defined as normal and 
hence symmetrical, the proportion of the total group passing the item 
is given by 


. aeek 


5 (5) 


D 


If we consider another common method of measuring item va- 
lidity, namely, the difference between the proportion of passes in the 
upper and lower quarters, ULD , we see that it, too, may be represent- 


* Since developing this abac, the writer has learned that a similar one has 
been previously developed by M. W. Richardson. 








62 PSYCHOMETRIKA 


-- Percentage of Lower Fifty Per Cent Passing the Item 





5S 10 16 2026 30 35 40 45 60 55 60 


Py -- Percentage of Upper Fifty Per Cent Passing the Item 
a 
° 


URE 2 
Abacs for Item-Test Correlation from Percentage of 
Upper and Lower Fifty Per Cent Passing the Item 


ed on the ordinates of Figure 1. The equation relating ULD to P, and 
P, is, of course, ; 


P, =P, + ULD, (6) 


representing a family of straight lines, all with slope +1 and y- in- 
tercepts equal to the observed difference. The lines corresponding to 
ULD = 0, ULD = .20, and ULD = .60 are sketched on Figure 1. 

Thus the abac might be used to yield, from a knowledge of P, 
and P; alone, values of upper-lower quarter difference, item difficulty 
and r;;. So many lines would, of course, be too confusing and the first 
two values may be computed from equations (5) and (6). 

Besides the two measures of item validity thus far discussed, 
namely r;, and ULD, another has been used, and might be more wide- 











CHARLES I. MOSIER AND JOHN V. McQUITTY 63 


ly used if its computation were not so tedious. This index is the 
critical ratio of ULD , 


C.R. = Pests , (7) 


FCP, - P1) 


This may be rewritten in terms of P,, P; and N, the number of 
cases in a quarter, by substituting the expression for the standard er- 
ror of the difference between two proportions, thus 


-- Percentage of Lower Group Passing the Item 









Py -- Percentage of Upper Group Passing the Item 
*s es S$ & $ & § & $RBBSRBS8S8SB 


FIGURE 3 
Abac for Computing the Critical Ratio of the Difference between Percentage of 
Upper Group and Percentage of Lower Group Passing the Item— 
Fifty Cases in Each Group. 








64 PSYCHOMETRIKA 


RR. — ———_——_—. —____— , (8) 
VP,(i— Pu) + P,(1— P1) 
VN 
Since for the analysis of any particular test the number of cases in 
the quarters will be the same for all the items, it will be convenient 
to write this as: 
P,—P, 


C.R. = YN —___~__________ , 
VP.(1—P.) + Pi (i — Pi) 








(9) 





where \/N is a constant. For samples of 50 cases from each extreme 
quarter, the values of C.R. for arbitrary values of P, and P; have been 
computed and plotted, and the points for constant C.R. joined by a 
smooth curve in Figure 3, an abac for determining the C.R. as a 
measure of item validity. 

To use the abac, the values of P, and P; for a given item are de- 
termined as before, the point plotted, and the value of C.R. read from 
the graph. This abac can be used directly only for samples of 50 
cases. If N + 50, all of the values read from the abac must be multi- 


plied by \/N/50, or .1414 VN. As in Figure 1, negative values may 
be obtained by reversing P, and P; in plotting, and then reading 
+C.R. This same abac may, of course, be used to obtain the critical 
ratio of the difference of any two proportions, regardless of whether 
or not they are based on the two extreme quarters, provided only 
that each group contains N cases. 

Finally, it should be pointed out that since the item-test correla- 
tion may now be computed with little labor from the same data as are 
required for the simplest sort of validity index, this method with its 
greater theoretical superiority should replace the use of the cruder 
indices. 


REFERENCES 


1. Adkins, Dorothy C. A rational comparison of item-selection techniques. 
Psychol. Bull., 1938, 35, 655. 

2. Chesire, L., Saffir, M., and Thurstone, L. L. Computing diagrams for the 
tetrachoric correlation coefficient. Chicago: Univ. Chicago Bookstore, 1933. 

8. Handy, N., and Lentz, T. F. Item value and test reliability. J. educ. 
Psychol., 1934, 25, 703-708. 

4. Lentz, T. F., Hirschstein, B., and Finch, F. H. Evaluation of methods of 
evaluating test items. J. educ. Psychol., 1932, 23, 344-350. 

5. Long, J. A., and Sandiford, P. The validation of test items. Bull. no. 3, 
Dept. Educ. Res., University of Toronto, 1935. 

6. Pintner, R., and Forlano, G. A comparison of methods of item selection for 
a personality test. J. appl. Psychol, 1987, 21, 648-652. 

















10. 


CHARLES I. MOSIER AND JOHN V. McQUITTY 65 


Richardson, M. W. The relation between the difficulty and differential va- 
lidity of a test. Psychometrika, 1936, June, 1, 33-50. 

Richardson, M. W. Notes on the rationale of item analysis. Psychometrika, 
1936, March, 1, 69-76. 

Richardson, M. W., and Kuder, G. F. The theory of the estimation of test 
reliability. Psychometrika, 1987, 2, 151-160. 

Swineford, Frances. Validity of test items. J. educ. Psychol, 1936, 27, 68- 
78. 




















PSYCHOMETRIKA—VOL. 5, NO. 1 
MARCH, 1940 


TIME SCORES AND FACTOR ANALYSIS 


H. D. LANDAHL 
The University of Chicago 


The assumption of specific operation times leads to equations 
similar to those which are the basis for a factor analysis but in 
which the time scores replace item scores. An analysis of time scores 
is shown to lead to results equivalent, in a first approximation, to 
those obtained from an analysis of item scores. It is suggested that 
by modifying the factor technique it may be possible to use addi- 
tional information to check with the results from an analysis. 


In approaching a problem by the method of factor analysis it 
may be desirable to use time scores. The question may then arise as 
to how the factors obtained will compare with the factors derived 
from item scores. We shall show that the factors will be essentially 
the same in either case. In the solution of the problem it is necessary 
to identify the factors obtained. Any method of using additional data 
to clarify the meaning of a factor or any method of identifying a fac- 
tor with some particular process would seem desirable. By introduc- 
ing a rather commonly made assumption we shall show that under 
certain conditions it is possible, at least in principle, to identify some 
process or part of a process with a factor obtained from tests using 
time scores. 

For many tests it is expedient to set a time limit and to take as 
the score the total number of responses made, some provision being 
made for errors. Frequently the errors and omissions are few, and 
the rate of performance is fairly constant over a considerable length 
of time. Under these conditions the reciprocal of the scores per unit 
of time, or the time per item obtained from a time limit test, could 
be interchanged with the time score per item obtained from timing 
a fixed number of items. 

In a particular test 7 let a certain operation k , which requires a 
time ¢,; by individual 7, occur nj, times per unit time. Let K be the 
number of specific independent types of operations performed in a 
group of tests. The number K will naturally vary from individual to 
individual, since each one performs the task j differently. However, 
tests can be so constructed that most of the operations will be per- 
formed by almost all of the individuals. Then under the above con- 


— 








68 PSYCHOMETRIKA 


ditions, we have for the time score (time per unit score), 7; , on test 
j by individual 7, 


K 
T ji = BS rx txi ° (1) 
k=1 


Equation (1) could be taken as a definition, since any time can be 
broken into smaller arbitrary units. But such a definition would not 
produce any results. Let us therefore assume that each operation 
time ¢,; , a statistical mean value, is the smallest functional unit to 
which it can be broken down, i.e., either the time t;; is consumed for 
the operation & or no part of the operation is performed. Under these 
conditions the times ¢t,; (k = 1, 2,---, K) are characteristics of the 
individual 7. The numbers n;, (or fractions, since n;, is defined per 
unit score) vary from test to test. 

Before proceeding further, let us point out some differences be- 
tween the above assumption and the approach recorded in various 
papers in the first volumes of Philosophische Studien. In the latter 
works, complication times were obtained which were consistent for 
not too complex situations. However, for more complex tasks it be- 
came evident that the subtractive procedure was not justified, and as 
a result, the technique was criticized. In order to use the subtractive 
procedure it is necessary to limit our assumption by two conditions. 
First, the times ¢,; must be entirely independent of one another. If 
two times which are fairly large can overlap one another even to a 
rather small extent, the slight overlap will obscure a small additional 
complication which caused the overlapping to occur. The overlapping 
of the two times, that is, the simultaneous occurrence of parts of two 
operations, might then still be small enough so that equation (1) 
could be used to advantage. The second requirement is that the values 
n;x must be identical in the two situations necessary to determine the 
difference. The addition of one or more operations may slightly change 
certain n’s, as certain tasks need no longer be performed. Again, in 
a subtractive procedure an additional small operation time might be 
obscured or conceivably come out negative, while equation (1) would 
still be valid. However, the objections mentioned above do apply in 
a different way to equation (1), a fact which should be kept in mind. 
Of particular importance is the possibility that the introduction of a 
certain operation may make possible the simultaneous function of 
two others, or may tend to inhibit or delay another, the result being 
noticeable in an analysis as an intercorrelation between the opera- 
tions. 

In a particular battery of tests it may be that certain operations 
always occur together, and there may be several of such groupings. 

















H. D. LANDAHL 69 


From these tests it would not be possible in any way to recognize this 
condition. The true dimensionality of the system could not be rec- 
ognized, though in other groups of tests it might be determinable. 
Hence, let all such groupings be made and let Ty be the new times, 
some of which are the sum of several t;, , so that now there are K’ < K 
operation groups which are fundamental to the particular group of 
tests. For simplicity let us take the case in which such a grouping is 
not necessary or has already been made and in which the primes have 
been dropped. We may then use equation (1). 

Equation (1) is in a form which lends itself to the method of 
factor analysis (1, 2, 3, 5). But, using correlation coefficients, it will 
be the variance in the ?’s and not the values of ¢ which will be impor- 
tant. By changing the factor analysis procedure with this in mind, 
so that covariances from time scores are used, it may be possible to 
obtain values corresponding to n;, and ¢;. Certain difficulties may 
arise, but we need not be concerned with them here. Some of the time 
elements ¢,; derived through such a method should be identified in 
other situations (as a particular reaction time or part thereof), and, 
if so, the corresponding values nj, may possibly be obtained by simple 
enumeration. Such information would constitute a check on the fac- 
tor procedure if it can be obtained. By using additional simple tests 
on individuals having one rather high value %; and the rest low, or 
vice versa, the process of identifying the time elements might be sim- 
plified. 

In equation (1) it is evident that all terms in ¢ are linear, al- 
though an apparent non-linearity could occur if there were intrinsic 
intercorrelations between certain &;. All the values of &; must also 
be positive and finite. If a certain time ¢; were infinite, equation (1) 
would not hold, since then certain items could not be solved and hence 
were omitted by the individual 7. He would then have an apparent 
t,; since a certain time would have been spent attempting the task. If 
somehow the individual recognized those operations he could not per- 
form and so omitted them, then those ;; would be zero in his case and 
the test would be a different task for him. Also, the values n;, must 
be greater than or equal to zero. 

The above discussion may perhaps be clarified by illustration 
with an example based in part on data from a test devised by the au- 
thor for other purposes.* It is necessary to give a brief description 
of the test as adapted for motor response. Three symbols are used, 
L,R,andT. If Lis presented alone, the left key is depressed. Simi- 


* This test is designated the ABC test by L. L. Thurstone in the Eighth 
Grade Battery, as yet unpublished. 








y 
Gere 





70 PSYCHOMETRIKA 


larly RF calls for response to the right key and 7 for both keys to- 
gether (or a third key). If a symbol occurs twice, as LL , then the 
response is again L so that the left key is depressed. If, however, two 
different symbols are presented then the response is to the third sym- 
bol, i.e., for RT the response is to L, or the left key is depressed. If 
now three symbols are presented, the first two symbols are treated as 
a pair which produces a mental response according to the above rules. 
The symbol of this response is taken together with the next or third 
symbol as a pair, and as there are no more symbols in this case, the 
response to the pair is given. As an example, RTT is responded to as 
R , since the first pair, RT , becomes L. Putting L with the third sym- 
bol T , we have a pair LT whose response is R. Similarly, TRT is re- 
sponded to as R and RRL asT, etc. The procedure is similar for four 
or more symbols except for the number of processes necessary to ob- 
tain the answer. 

A number of problems varying in length from one symbol to 
eight symbols were prepared for presentation. From these, ten cards 
were selected for each of the eigitt levels of difficulty. The results for 
one subject are given. The time in seconds is the median of ten trials; 
each trial had ten items and the time score on each trial was the me- 
dian of ten correct responses. Failures to respond and wrong re- 
sponses were not included. The subject was not informed of errors or 
of the difficulty of each item to be presented. 

















Item Length Response P.E. Time 
Time Difference 
1 59 | 02 59 
2 83 03 24 
3 1.34 09 1 
4 1.95 13 61 
5 2.50 | 18 55 
6 3.00 | 14 50 
7 3.88 | 29 88 
8 | 4.59 | 15 ‘71 








The first column gives the number of symbols in the problem. 
The second column gives the median response time. The third column 
gives an estimate of the probable error from the ten trials. The last 
column gives the differences between successive response times, with 
the first time repeated. The procedure is essentially the method of 
complication times. 

Consider now the figures in the last column. One might expect 
that the first two values of the last column might be different from 
one another and from the rest of the figures. On the other hand it 














H. D. LANDAHL 71 


would be reasonable to expect the rest of the figures in the column to 
be the same, since each value represents an amount of time to per- 
form a similar task. Within the error of the data, the last six figures 
could well be values measuring the same process, though there seems 
to be a slight increase down the column. 

Suppose now a set of values similar to those of the last column 
had been obtained from each of a group of individuals and an analysis 
had been made of these variables. One would then expect that the 
last six variables would cluster together, even though there were a 
significant increase down the column for the average of the individ- 
uals. The factor, if it occurred, could then be identified with the pro- 
cess represented by the difference times. Now the second variable is 
certainly different from the rest, at least in magnitude. One might 
expect it to have a rather low correlation with the other variables. 
But the first variable, so far as magnitude is concerned, is indistin- 
guishable from the last six. However, it is undoubtedly composed of 
several obvious factors and would perhaps have a low correlation with 
the other variables. 

Suppose that the above system could have been represented by 
three factors, the first variable, the second variable, and the last six 
variables. If, instead of analyzing the variables of the last column of 
the table, we had treated the second column, we should have obtained 
in this case the same factors; but the last six variables, instead of clus- 
tering together, would fan out toward the third factor from the first 
two, and they would do so in the order they occur. If covariances 
could be used in the analysis so that the relative lengths of the vectors 
would be correct, then the projections of the last six variables on the 
third axis would also increase systematically. 

In a situation similar to the above example it would be relatively 
simple to obtain results which would verify or contradict certain as- 
sumptions regarding the nature of particular processes. In a general 
case it would be more difficult to make use of the above suggestions. 
However, at least in some situations the above suggestions may be 
used to advantage. If a factor can be identified with a particular part 
of the time taken for some process, then it would be easier, in turn, 
to attempt to analyze the mechanism in terms of a more concrete neu- 
rological mechanism according to the methods of N. Rashevsky (4). 

Let us now attempt to compare the results obtained by an analy- 
sis of time scores [equation (1)] with those obtained by an analysis 
of item scores. Let us define all item scores, S;;, as the number of 
items performed per unit time on test 7 by individual 7, i.e., the score 
in the fixed time divided by the time. Then, as stated above, we have 
for the relation between the item scores and time scores, 








72 PSYCHOMETRIKA 


Sii = 1/Tj; ° (2) 
Now the score S;; may be given by 
Pp 
Sji ome 2 Ajp Xpi (3) 
p=1 


where aj, is the weight in test 7 of the factor p and 2,; is the ability 
of individual 7 in the factor p (5). 

Let us define & as the average time for the operation k of the N 
individuals used in the analysis, or 


-- 12 
ty — v2 tri ’ (4) 
and let 
ies = Oe + tus . (5) 
Now substituting equation (5) into (1), we have 
= Njk Tki 
te ee ee a 6 
j + Fat Bt (6) 


Introducing equation (6) into (2), we have, 


1 D Nik ti > Nix ti \?2 
Sii = ; 1 oy. es + iz a a eee ’ (7) 
- Nik Ut 2 Nix t, x Nik tk 


or approximately, dropping second and higher degree terms, 


1 2 sf 
Sii (Sank) 3 Nix (ty — ti) . (8) 


In order to justify the use of the approximation, which holds 
only when the second term in parentheses in equation (6) or (7) is 
sufficiently smaller than unity, we may note that for most individuals 
%; is sometimes positive and sometimes negative and the numerator 
sum is zero for the “average” individual of the group. Also, it should 
be remembered that the variance increases with the mean. However, 
for the extremely slow individuals with large © &; and hence large 

k 








T;; , the approximation is not valid. There will be few such individ- 
uals, so we shall here neglect the error introduced. 

In equation (8) the value t, — %; may be termed a complement- 
ary time score, since the smaller this score, the larger the time score 














H. D. LANDAHL 13 


and the smaller the item score. Let us define as this complementary 
score 
Ons = he — tes = 2, — tes « (9) 


If the mean score for a particular test were 100 seconds and an in- 
dividual obtained a score of 86 seconds, his complementary score 
would be 114. We may then write for equation (8) 








K 
Sii — * Qjk Eni ’ (10) 
kal 
where 
tn = = 
= ls 
j Tm ‘a (11) 
= sae (12) 
ki = —. 
2 Min ti 


The denominator in each case is the mean score of the group on test 
j, since, using equations (4) and (1), 


Sth = TE aets = LIM. (13) 

If we now set 
Oj: = Ajp, (14) 
En; = pi» (15) 


(or, strictly, aj, proportional to a;, and &,; proportional to x,;), which 
may be done since equations (1) and (3) are both assumed to be 
unique and because of the concepts involved in n and a and in the sub- 
scripts k and p, then equation (1) becomes equation (3) ; and to the 
approximation involved in passing from equation (7) to (8), the 
analysis of equation (1) is identical to the analysis of (3) under the 
conditions initially imposed. From equation (9) a correlation be- 
tween one test and a second test, the latter being scored in time units, 
would be, to a close approximation, the negative of the correlation be- 
tween the tests with the latter test scored by items. Since a T score 
is essentially a reciprocal S score, the analysis of a correlation table 
derived from reciprocal item scores would lead to essentially the same 
results as the analysis based on the item scores directly, except that 
one might be lead to expect that there may be a difference in sharp- 
ness of the planes in the solution by the rotational procedure (5), (6). 





74 PSYCHOMETRIKA 


By introducting an assumption of specific operation times which 
are additive under certain conditions we have obtained the relatively 
simple equation (1) which is of the form of the basic equation for a 
factor analysis. Equation (1) is shown to be very nearly equivalent 
to the latter basic equation. It is suggested that the treatment of data 
on the basis of equation (1) has the advantage that it may then be 
possible to identify a factor with a certain time taken for a psycholog- 
ical process or part thereof. Such an identification would be facili- 
tated if covariances instead of correlations could be analyzed so that 
the relative magnitudes of the projections of variables on a factor 
would be obtainable. 


REFERENCES 


Holzinger, K. J. Preliminary report on Spearman-Holzinger unitary trait 
study. Chicago: Univ. Chicago Press, 1935. 

Hotelling, H. Analysis of a complex of statistical variables into principal 
components. J. educ. Psychol., 1988, 24, 417-441. 

Kelley, T. L. Essential traits of mental life. Cambridge: Harvard Univ. 
Press, 1935. 

Rashevsky, N. Advances and applications of mathematical biology. Chicago: 
Univ. Chicago Press, 1940. 

Thurstone, L. L. Vectors of mind. Chicago: Univ. Chicago Press, 1935. 
Thurstone, L. L. A new rotational method in factor analysis. Psychometrika, 
1988, 3, 199-218. 























